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5 RECOMBINANT PROTEIN PRODUCT ION IN BOVINE ADENOVIRUS 

EXPRESSION VECTOR SYSTEM 

Technical Field 

The present invention relates novel bovine 

10 adenovirus (BAV) expression vector systems in which 
one or both of the early region l (El) and the early 
region 3 (E3) gene deletions are replaced by a foreign 
gene and novel recombinant mammalian cell lines stably 
transformed with BAV El sequences f* and therefore, 

15 expresses El gene products, to allow a bovine 

adenovirus with an El gene deletion replaced by a 
foreign gene to replicate therein. These materials 
are used in production of recombinant BAV expressing 
heterologous (antigenic) polypeptides or fragments for 

20 the purpose of live recombinant virus or subunit 
vaccines or for other therapies. 

Background of the Invention 

The adenoviruses cause enteric or 

25 respiratory infection in humans as well as in domestic 
and laboratory animals. 

The bovine adenoviruses (BAVs) comprise at 
least nine serotypes divided into two subgroups. 
These subgroups have been characterized based on 

30 enzyme-linked immunoassays (ELISA) , serologic studies 
with immunofluorescence assays, virus-neutralization 
tests, immunoelectron microscopy, by their host 
specificity and clinical syndromes. Subgroup 1 
viruses include BAV 1, 2, 3 and 9 and grow relatively 

35 well in established bovine cells compared to subgroup 
2 which includes BAV 4, 5, 6, 7 and 8. 

BAV3 was first isolated in 1965 and is the 
best characterized of the BAV genotypes and contains a 
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genome of approximately 35 kb (Kurokawa et al (1978) 
J. Virol. 28:212-218) . The locations of hexon (Hu et 
al (1984) J. Viol. 49:604-608) and proteinase (Cai et 
al., (1990) Nuc. Acids Res. , 18:5568), genes in the 
5 BAV3 genome have been identified and sequenced. 

However, the location and sequences of other genes 
such as early region 1 (El) and 3 (E3) in the BAV 
genome have not been reported. 

In the human adenovirus (HAd) genome there 

10 are two important regions: El and E3 in which foreign 
genes can be inserted to generate recombinant 
adenoviruses (Berkner and Sharp (1984) Nuc. Acid Res. , 
12:1925-1941 and Haj-Ahmad and Graham (1986) J. 
Virol. . 57:267-274) . El proteins are essential for 

15 virus replication in tissue culture, however, 

conditional-helper adenovirus recombinants containing 
foreign DNA in the El region, can be generated in a 
cell line which constitutively expresses El (Graham et 
al., (1977) J. Gen. Virol. , 3«6:59-72). In contrast, 

20 E3 gene products of HAd 2 and HAd 5 are not required 
for in vitro or in vivo infectious virion production, 
but have an important role in host immune responses to 
virus infection (Andersson et al (1985) Cell 43:215- 
222; Burgert et al (1987) EMBO J. 1:2019-2026; Carlin 

25 et al (1989) Cell 57:135-144; Ginsberg et al (1989) 
PNAS. USA 86:3823-3827; Gooding et al (1988) Cell 
53:341-346; Tollefson et al (1991) J. Virol. 65:3095- 
3105; Wold and Gooding (1989) Mol. Biol. Med. 6:433- 
452 and Wold and Gooding (1991) Virology 184:1-8). 

30 The E3-19kiloDalton (kDa) glycoprotein (gpl9) of human 
adenovirus type 2 (HAd2) binds to the heavy chain of a 
number of class 1 major histocompatibility complex 
(MHC) antigens in the endoplasmic reticulum thus 
inhibiting their transport to the plasma membrane 

35 (Andersson et al. (1985) Cell 41:215-222; Burgert and 
Kvist, (1985) Cell 41:987-997; Burgert and Kvist, 
(1987) EMBO J. 6:2019-2026). The E3-14.7kDa protein 
of HAd2 or HAd5 prevents lysis of virus- infected mouse 
cells by tumor necrosis factor (TNF) (Gooding et al. 
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(1988) Cell 53:341-346) . In addition, the E3-10.4kDa 
and E3-l4.5kDa proteins form a complex to induce 
endosomal-mediated internalization and degradation of 
the epidermal growth factor receptor (EGF-R) in virus- 
5 infected cells (Carlin et al. Cell 57:135-144; 

Tollefson et al. (1991) i_J^irol^ 65:3095-3105) . The 
helper-independent recombinant adenoviruses having 
foreign genes in the E3 region replicate and express 
very well in every permissive cell line (chanda et al 
10 (1990) Virology 125:535-547; Dewar et al (1989) 
Virol - 63:129-136; Johnson et al (1988) Virology 
164:1-14; Lubeck et al (1989) PNAS. USA 86:6763-6767; 
McDermott et al (1989) Virology 169:244-247; Mittal et 
al (1993) Virus Res. 28:67-90; Morin et al (1987) 
15 PNAS. USA 84:4626-4630; Prevec et al (1990) J. inf. 
Dis^_ 161:27-30; Prevec et al (1989) J. Gen. Virol, 
70:429-434; Schneider et al (1989) J. Gen. Virol. 
70:417-427 and Yuasa et al (1991) J. Gen. Viml. 
72:1927-1934). Based on the above studies and the 
20 suggestion that adenoviruses can package approximately 
105% of the wild-type (wt) adenovirus genome (Bett et 
al (1993) J. Virol. 67:5911-5921 and Ghosh-Choudhury 
et al (1987) EMBO. J. 6:1733-1739), an insertion of up 
to 1.8 kb foreign DNA can be packaged into adenovirus 
25 particles for use as an expression vector for foreign 
proteins without any compensating deletion. 

It is assumed that an indigenous adenovirus 
vector would be better suited for use as a live 
recombinant virus vaccine in different animal species 
30 compared to an adenovirus of human origin. Non-human 
adenovirus-based expression vectors have not been 
reported so far. If like HAds E3, the E3 regions in 
other adenoviruses are not essential for virus 
replication in cultured cells, adenovirus recombinants 
35 containing foreign gene inserts in the E3 region could 
be generated. 

BAV3 is a common pathogen of cattle usually 
resulting in subclinical infection though occasionally 
associated with a more serious respiratory tract 
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infection (Darbyshire et al., 1966 Res. Vet Sci 7:81- 
93; Mattson et al., 1988 J. Vet Res 49:67-69). BAV3 
can produce tumors when injected into hamsters 
(Darbyshire, 1966 Nature 211 :102) and viral DNA can 
5 efficiently effect morphological transformation of 

mouse, hamster or rat cells in culture (Tsukamoto and 
Sugino, 1972 J. Virol . 9:465-473; Motoi et al., 1972 
Gann 63 :415-418; M. Hitt f personal communication). 
Cross hybridization was observed between BAV3 and 

10 human adenovirus type 2 (HAd2) (Hu et al., 1984 J. 
Virol . 49:604-608) in most regions of the genome 
including some regions near but not at the left end of 
the genome. 

The E1A gene products of the group C human 

15 adenoviruses have been very extensively studied and 
shown to mediate transactivation of both viral and 
cellular genes (Berk et al., 1979 Cell 17:935-944; 
Jones and Shenk, 1979 Cell 16:683-689; Nevins, 1981 
Cell 26:213-220; Nevins, 1982 Cell 29:913-919; 

20 reviewed in Berk, 1986 Ann. Res. Genet 2j):45-79), to 

effect transformation of cells in culture (reviewed in 
Graham, F.L. (1984) "Transformation by and 
oncogenicity of human adenoviruses. In: The 
Adenoviruses." H.S. Ginsberg, Editor. Plenum Press, 

25 New York; Branton et al., 1985 Biochim . Biophys . Acta 
780:67-94) and induce cell DNA synthesis and mitosis 
(Zerler et al., 1987 Mol. Cell Biol. 7:821-929; Bellet 
et al., 1989 J. Virol. 63:303-310; Howe et al., 1990 
PNAS. USA 87:5883-5887; Howe and Bayley, 1992 Virolocrv 

30 186:15-24). The E1A transcription unit comprises two 
coding sequences separated by an intron region which 
is deleted from all processed E1A transcripts. In the 
two largest mRNA species produced from the ElA 
transcription unit, the first coding regions is 

35 further subdivided into exon 1, a sequence found in 
both the 12s and 13s mRNA species, and the unique 
region, which is found only in the 13s mRNA species. 
By 
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comparisons between E1A proteins of human and simian 
adenoviruses three regions of somewhat conserved 
protein 

sequence (CR) have been defined (Kimelman et al., 1985 
5 J. Virol. 53:399-409). CR1 and CR2 are encoded in 
exon 1, while CR3 is encoded in the unique sequence 
and a small portion of exon 2. Binding sites for a 
number of cellular proteins including the 
retinoblastoma protein Rb f cyclin A and an associated 

10 protein kinase %>33 cd * 1 , and other, as yet unassigned, 
proteins have been defined in exon 1 encoded regions 
of E1A proteins (Yee and Branton, 1985 Virology 
147:142-153; Harlow et al., 1986 Mol. Cell Biol. 
6:1579-1589; Barbeau et al. , 1992 Biochem. Cell Biol. 

15 70:1123-1134). Interaction of E1A with these cellular 
proteins has been implicated as the mechanism through 
which E1A participates in immortalization and 
oncogenic transformation (Egan et al, 1989 Oncogene 
4:383-388; Whyte et al., 1988 Nature 334:124-129; 

20 whyte et al, 1988 J. Virol. 62:257-265). While E1A 
alone may transform or immortalize cells in culture, 
the coexpression of both E1A and either the ElB-19k 
protein or the ElB-55k protein separately or together 
is usually required for high frequency transformation 

25 of rodent cells in culture (reviewed in Graham, 1984 
supra; Branton et al., 1985 supra; McLorie et al., 
1991 J. Gen Virol. 72:1467-1471). 

Transactivation of other viral early genes 
in permissive infection of human cells is principally 

30 mediated by the amino acid sequence encoded in the CR3 
region of E1A (Lillie et al., 1986 Cell 46:1043-1051). 
Conserved cysteine residues in a CysX 2 CysX 13 CysX 2 Cys 
sequence motif in the unique region are associated 
with metal ion binding activity (Berg, 1986 supra) and 

35 are essential for transactivation activity (Jelsma 

et al., 1988 Virology 163 :494-502; Culp et al., 1988 
PNAS. USA 85:6450-6454). A s well, the amino acids in 
CR3 which are immediately amino (N) -terminal to the 
metal binding domain have been shown to be important 
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in transcription activation , while those immediately 
carboxy (C) -terminal to the metal binding domain are 
important in forming associations with the promoter 
region (Lillie and Green, 1989 Nature 338:39-44; see 
5 Fig. 3). 

The application of genetic engineering has 
resulted in several attempts to prepare adenovirus 
expression systems for obtaining vaccines. Examples 
of such research include the disclosures in U.S. 
10 patent 4,510,245 on an adenovirus major late promoter 
for expression in a yeast host; U.S. patent 4,920,209 
on a live recombinant adenovirus type 7 with a gene 
coding for hepatitis-B surface antigen located at a 
deleted early region 3; European patent 389 286 on a 

15 non-defective human adenovirus 5 recombinant 

expression system in human cells for HCMV major 
envelope glycoprotein; WO 91/11525 on live non- 
pathogenic immunogenic viable canine adenovirus in a 
cell expressing Ela proteins; French patent 2 642 767 

20 on vectors containing a leader and/or promoter from 
the E3 of adenovirus 2. 

The selection of a suitable virus to act as 
a vector for foreign gene expression, and the 
identification of a suitable non-essential region as a 

25 site for insertion of the gene pose a challenge. In 
particular, the insertion site must be non-essential 
for the viable replication of the virus and its 
effective operation in tissue culture and also in 
vivo * Moreover, the insertion site must be capable of 

30 accepting new genetic material, whilst ensuring that 

the virus continues to replicate. An essential region 
of a virus genome can also be utilized for foreign 
gene insertion if the recombinant virus is grown in a 
cell line which complements the function of that 

35 particular essential region in trans . 

The present inventors have now identified 
suitable regions in the BAV genome and have succeeded 
in inserting foreign genes to generate BAV 
recombinants . 
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Disclosure of the Invention 

The present invention relates to novel 
bovine adenovirus expression vector systems in which 
part or all of one or both of the El and E3 gene 
5 regions are deleted and to recombinant mammalian cell 
lines of bovine origin transformed with the BAV El 
sequences, and thus, constitutively express the El 
gene products to allow bovine adenovirus, having a 
deletion of part or all of the El gene region replaced 
10 by a heterologous nucleotide sequence encoding a 

foreign gene or fragment thereof, to replicate therein 
and use of these materials in production of 
heterologous (antigenic) polypeptides or fragments 
thereof . 

15 The invention also related to a method of 

preparing a live recombinant virus or subunit vaccines 
for producing antibodies or cell mediated immunity to 
an infectious organism in a mammal, such as bovine, 
which comprises inserting into the bovine adenovirus 

20 genome the gene or fragment coding for the antigen 

which corresponds to said antibodies or induces said 
cell mediated immunity, together with or without an 
effective promoter therefore, to produce BAV 
recombinants . 

25 Generally, the foreign gene construct is 

cloned into a nucleotide sequence which represents 
only a part of the entire viral genome having one or 
more appropriate deletions. This chimeric DNA 
sequence is usually present in a plasmid which allows 

30 successful cloning to produce many copies of the 

sequence. The cloned foreign gene construct can then 
be included in the complete viral genome, for example, 
by in vivo recombination following a DNA-mediated 
cotransfection technique. Multiple copies of a coding 

35 sequence or more than one coding sequences can be 

inserted so that the recombinant vector can express 
more than one foreign protein. The foreign gene can 
have additions, deletions or substitutions to enhance 
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expression and/or immunological effects of the 
expressed protein. 

The invention also includes an expression 
system comprising an bovine adenovirus expression 
5 vector wherein heterologous nucleotide sequences with 
or without any exogenous regulatory elements, replace 
the El gene region and/ or part or all of the E3 gene 
region . 

The invention also includes (A) a 

10 recombinant vector system comprising the entire BAV 
DNA and a plasmid or two plasmids capable of 
generating a recombinant virus by in vivo 
recombination following cotransfection of a suitable 
cell line comprising BAV DNA representing the entire 

15 wild-type BAV genome and a plasmid comprising a bovine 
adenovirus left or right end sequences containing the 
El or E3 gene regions, respectively, with a 
heterologous nucleotide sequence encoding a foreign 
gene or fragment thereof substituted for part or all 

20 of the El or E3 gene regions; (B) a live recombinant 
bovine adenovirus vector (BAV) system selected from 
the group consisting of: (a) a system wherein part or 
all of the El gene region is replaced by a 
heterologous nucleotide sequence encoding a foreign 

25 gene or fragment thereof; (b) a system wherein a part 
or all of the E3 gene region is replaced by a 
heterologous nucleotide sequence encoding a foreign 
gene or fragment thereof; and (c) a system wherein 
part or all of the El gene region and part or all of 

30 the E3 gene region are deleted and a heterologous 
nucleotide sequence encoding a foreign gene or 
fragment thereof is inserted into at least one of the 
deletions;. (C) a recombinant bovine adenovirus (BAV) 
comprising a deletion of part or all of El gene 

35 region, a deletion of part or all of E3 gene region or 
deletion of both, and inserted into at least one 
deletion a heterologous nucleotide sequence coding for 
an antigenic determinant of a disease causing 
organism; (D) a recombinant bovine adenovirus 
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expression system comprising a deletion of part or all 
of El, a deletion of part or all of E3, or both 
deletions, and inserted into at least one deletion a 
heterologous nucleotide sequence coding for a foreign 
5 gene or fragment thereof under control of an 

expression promoter: or (E) a recombinant bovine 
adenovirus (BAV) for producing an immune response in a 
mammalian host comprising: (1) BAV recombinant 
containing a heterologous nucleotide sequence coding 

10 for an antigenic determinant needed to obtain the 

desired immune response in association with or without 
(2) an effective promoter to provide expression of 
said antigenic determinant in immunogenic quantities 
for use as a live recombinant virus or recombinant 

15 protein or subunit vaccine; (F) a mutant bovine 

adenovirus (BAV) comprising a deletion of part or all 
of El and/or a deletion of part or all of E3. 

Recombinant mammalian cell lines stably 
transformed with BAV El gene region sequences, said 

20 recombinant cell lines thereby capable of allowing 

replication therein of a bovine adenovirus comprising 
a deletion of part or all of the El or E3 gene regions 
replaced by a heterologous or homologous nucleotide 
sequence encoding a foreign gene or fragment thereof. 

25 The invention also includes production, isolation and 
purification of polypeptides or fragments thereof, 
such as growth factors, receptors and other cellular 
proteins from recombinant bovine cell lines expressing 
BAV El gene products. 

30 The invention also includes a method for 

providing gene therapy to a mammal in need thereof to 
control a gene deficiency which comprises 
administering to said mammal a live recombinant bovine 
adenovirus containing a foreign nucleotide sequence 

35 encoding a non-defective form of said gene under 

conditions wherein the recombinant virus vector genome 
is incorporated into said mammalian genome or is 
maintained independently and extrachromosomally to 
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provide expression of the required gene in a target 
organ or tissue. 

Another aspect of the invention provides a 
virus vaccine composition which comprises the 
5 recombinant virus or recombinant protein in 

association with or without a pharmaceutical ly 
acceptable carrier. The recombinant virus vaccine can 
be formulated for administration by an oral dosage 
(e.g. as an enteric coated tablet) , by injection or 
10 otherwise. More specifically, these include a vaccine 
for protecting a mammalian host against infection 
comprising a live recombinant adenovirus or 
recombinant protein produced by the recombinant 
adenovirus of the invention wherein the foreign gene 
15 or fragment encodes an antigen and formulated with or 
without a pharmaceutically acceptable carrier. 

The invention also includes methods of 
producing antibodies or cell mediated immunity in a 
mammal including (1) a method for eliciting an immune 
20 response in a mammalian host against an infection 

comprising: administering a vaccine comprising a live 
BAV recombinant of the invention wherein the foreign 
gene or fragment encodes an antigen with or without a 
pharmaceutically acceptable carrier, and (2) a method 
25 for eliciting an immune response in a mammalian host 
against an infection comprising: administering a 
vaccine comprising a recombinant antigen prepared by 
culturing a BAV recombinant wherein the foreign gene 
or fragment encodes the desired antigen with or 
30 without a pharmaceutically acceptable carrier. 

The following disclosure will render these 
and other embodiments of the present invention readily 
apparent to those of skill in the art. While the 
disclosure often refers to bovine adenovirus type 3 
35 (BAV3) f it should be understood that this is for the 
purpose of illustration and that the same features 
apply to bovine adenovirus of the other type, l, 2, 4, 
5, 6, 7 8, and 9 and the invention described and 
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claimed herein is intended to cover all of these 
bovine adenovirus types. 

Brief Description of the Drawings 
5 Figure 1 . Sequence and major open reading 

frames of the left 11% of the BAV3 genome. The region 
comprises the El and protein IX transcription region. 
The 195 nucleotide inverted terminal repeat sequence 
identified by Shinagawa et al., 1987 Gene 55:85-93 is 

10 shown in italics. The amino acid sequence for the 

largest E1A protein, two E1B proteins and protein IX 
are presented. The probable splice donor ([), splice 
acceptor (]) and intron sequence ( underlined italics ) 
within the E1A region are marked. A 35 base pair 

15 repeat sequence between El A and E1B is indicated in 

bold underline . Possible transcription promoter TATA 
sequences and possible poly A addition sequences AATAA 
are also indicated. 

Figure 2 . Regions of homology in the E1A 

20 proteins of BAV3 and human adenovirus type 5 (HAd5) . 
The amino acid residue of each serotype is indicated, 
A. Conserved region 3 (CR3) of HAd5 subdivided into 
three functional regions as defined by Lillie et al 
(1989) Nature ,338:39-44 and described in the 

25 Background of the Invention. The intron sequence of 
BAV3 E1A occurs within the serine amino acid codon at 
position 204. B. A portion of conserved region 2 
(CR2) of HAd5, showing the residues thought to be 
important in the binding of retinoblastoma protein Rb 

30 (Dyson et al., 1990 J. Virol. 64:1353-1356), and the 
comparable sequence from BAV3. 

Figure 3 . Homology regions between the HAd5 
and E1B 19k (176R) protein and the corresponding BAV3 
(157R) protein. The amino acid residue number for 

35 each of the viruses is indicated. 

Figure 4 . The C-terminal 346R of HAd5 
E1B 56k (496R) and the corresponding BAV3 protein 
(420R) . The HAd5 protein comparison begins at 
residue 150 and the BAV3 (in italics) at residue 74. 
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The amino terminal regions of these proteins which are 
not presented show no significant homology. 

Figure 5 . Homology comparison of the amino 
acid sequence of HAd5 protein IX and the corresponding 
5 protein of BAV3 (in italics). 

Figure 6 . The genome of BAV3 showing the 
location of .EcoRI, Xbal and BAMRI sites and the 
structure of the 5100bp segment from 77 to 92 m.u. 
ORFs for the upper strand which can encode 60 amino 

10 acids or more are represented by bars. Shaded 

portions indicate regions of similarity to pVIII, 
14. 7K E3 and fibre proteins of HAd2 or -5. The first 
methionine followed by a stretch of amino acids of at 
least 50 is shown by an open triangle. Termination 

15 codons for ORFs likely to code for viral proteins are 
shown by closed triangles. 

Figure 7 . Nucleotide sequence of BAV3 
between 77 and 92 m.u. showing ORFs that have the 
potential to encode polypeptides of at least 50 amino 

20 acids after the initiating methionine. The nucleotide 
sequence was analyzed using the program DISPCOD 
(PC/GENE) . Potential N-glycosylation sites (N-X-T/S) 
and polyadenylation signals are underlined and the 
first methionine of each ORF is shown in bold. 

25 Figure 8 . Comparison between the predicted 

amino acid sequences for the ORFs of BAV3 and known 
proteins of HAd2 or -5 using the computer program 
PALIGN (PC/GENE) , with comparison matrix structural- 
genetic matrix; open gap cost 6; unit gap cost 2. 

30 Identical residues are indicated by a colon and 

similar residues by a dot. (a) Comparison between the 
predicted amino acid sequence encoded by the 3' end of 
BAV3 ORF 1 and the HAd2 hexon-associated pVIII 
precursor. (b) Comparison between the ORF 4 and the 

35 HAd5 14. 7K E3 protein. (c) Comparison between the 
predicted amino acid sequence encoded by BAV3 ORF 6 
and the HAd2 fibre protein. 

Figure 9 . Construction of BAV3 E3 transfer 
vector containing the firefly lucif erase gene. The 
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3.0 kb BamHI 'D' fragment of the BAV3 genome which 
falls between m.u. 77.8 and 86. 4 , contains almost the 
entire E3 region (Mittal et al (1992) J. Gen. Virol . 
23:3295-3000). This 3.0 kb fragment was isolated by 
5 digesting BAV3 DNA with BamHI and cloned into pUC18 at 
the BamHI site to obtain pSM14. Similarly, the 4.8 kb 
BamHI 'C fragment of BAV3 DNA which extends between 
m.u. 86.4 and 100 was isolated and inserted into pUC18 
to produce pSM17. To delete a 696 bp Xhol-Ncol 
10 fragment, pSM14 was cleaved with Xhol and Ncol, the 
larger fragment was purified and the ends were made 
blunt with Klenow fragment of DNA 

polymerase I and a Nrul-Sall linker was inserted to 
generate pSM14del2. A 2.3 kb BamHI fragment 

15 containing BAV3 sequences, an E3 deletion and Nrul and 
Sail cloning sites, was inserted into pSM17 at the 
BamHI site to obtain pSM41, however, this step was not 
required for construction of a BAV3 E3 transfer 
vector. A 1716 bp fragment containing the firefly 

20 lucif erase gene (de Wet et al (1987) Mol. Cell. Biol. 
7:725-737) was isolated by digesting pSVOA/L (provided 
by D. R. Helinski, University of California at San 
Diego, CA) with BsmI and Sspl as described (Mittal et 
al (1993) Virus Res. £8:67-90), and the ends were made 

25 blunt with Klenow. The lucif erase gene was inserted 
into pSM41 at the Sail site by blunt end ligation. 
The resultant plasmid was named pSM41-Luc which 
contained the luciferase gene in the same orientation 
as the E3 transcription unit. The plasmid pKN30 was 

30 digested with Xbal and inserted into pSM41-Luc 

(partially cleaved with Xbal) at a Xbal site present 
within the luciferase gene to obtain pSM41-Luc-Kan. 
The plasmid pSM14 was digested with BamHI and a 3.0 kb 
fragment was isolated and inserted into pSM17 at the 

35 BamHI site to generate pSM43. The 18.5 kb Xbal 'A' 
fragment of the BAV3 genome which falls between 
m.u. 31.5 and 84.3 was cloned into pUC18 at the Xbal 
site to result pSM21. A 18.5 kb Xbal fragment was 
purified from pSM21 after cleavage with Xbal and 
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inserted into pSM43 at the Xbal site and the resultant 
plasmid was named pSM51. A 

7.7 kb BamHI fragment containing the lucif erase gene 
and kan r gene was isolated after digesting pSM41-Luc- 
5 Kan with BamHI and ligated to pSM51, partially 

digested with BamHI, to isolate pSM51-Luc-Kan in the 
presence of ampicillin and kanamycin. Finally the kan r 
gene was deleted from pSM51-Luc-Kan by partial 
cleavage with Xbal and religation to obtain pSM51-Luc. 
10 Figure 10. Generation of BAV3 recombinants 

containing the firefly lucif erase in the E3 region. 
The plasmid pSM51-Luc contains the BAV3 genome between 
m.u. 77.8-84.3 and 31.5-100, a 696 bp deletion in E3 
and the luciferase gene in E3 in the E3 parallel 
15 orientation. The BAV3 genome digested with Pvul and 
uncut pSM51-Luc were used for cotransf ection of MDBK 
cells transformed with a plasmid containing BAV3 El 
sequences to rescue the luciferase gene in E3 of the 
BAV3 genome by in vivo recombination. The resulting 
20 BAV3-lucif erase recombinants (BAV3-Luc) isolated from 
two independent experiments were named BAV3-Luc (3.1) 
and BAV3-Luc (3.2). The BamHI restriction map of the 
BAV3 -Luc genojne is shown. The position and 
orientation of the firefly luciferase gene is shown as 
25 a hatched arrow. 

Figure 11 . Southern blot analyses of 
restriction enzymes digested DNA fragments of the wt 
BAV3 or recombinant genomes by using a 696 bp Xhol- 
Ncol fragment from pSM14 (Fig. 9) and a DNA fragment 
30 containing the luciferase gene as probes. 100 ng DNA 
isolated from the mock (lanes 1, 2, 3), BAV3-Luc (3.1) 
(lanes 4, 5, 6), BAV3-Luc (3.2) (lanes 7, 8, 9) or wt 
BAV3 (lanes 10, 11 12) -infected MDBK cells were 
digested with BamHI (lanes 1, 4, 7, 10), EcoRI (lanes 
35 2, 5, 8, 11) or Xbal (lanes 3, 6, 9, 12) and analyzed 
by agarose gel electrophoresis. The DNA fragments 
from the gel were transferred onto a GeneScreenPlus™ 
membrane and hybridized with a 696 bp Xhol-Ncol 
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fragment from pSM14 (Fig. 9) labeled with 32 P using 
Pharmacia Oligolabeling 

Kit (panel A) . Panel B blot represents duplicate 
samples as in panel A but was probed with a 1716 bp 
5 Bsml-Sspl fragment containing the luciferase gene 
(Fig. 9) . The sizes of bands visualized following 
hybridization are shown in kb on the right in panel A 
and on the left in panel B. 

B: BamHI, E: EcoRI, Xb: Xbal, 3.1: BAV3-Luc (3.1), 

10 3.2: BAV3-Luc (3.2) and wt: wild-type BAV3. 

Figure 12. Single step growth curve for wt 
BAV3 and BAV3 -Luc • Confluent monolayers of MDBK cells 
in 25 mm multi-well culture plates were inoculated 
with the wt BAV3, BAV3 -Luc (3.1) or BAV3-Luc (3.2) at 

15 a m.o.i. of 10 p.f.u. per cell. The virus was allowed 
to adsorb for 1 h at 37 °C, cell monolayers were washed 
3 times with PBS ++ (0.137 M NaCl, 2.7 rfM KC1, 8 mM 
Na 2 HP0 4 , 1.5 mM KH 2 P0 4 , containing 0.01% CaCl 2 .2H 2 0 & 
0.01% MgCl 2 .6H 2 0) and incubated at 37°C in 1 ml 

20 maintenance medium containing 2% horse serum. At 
various times post-infection, cells were harvested 
along with the supernatant, frozen and thawed three 
times and titrated on MDBK cells by plaque assay. 
Results are the means of duplicate samples. 

25 Figure 13 . Kinetics of luciferase 

expression in MDBK cells-infected with BAV3-Luc. 
Confluent MDBK cell monolayers in 25 mm multi-well 
culture plates were infected with BAV3-Luc (3.1) or 
BAV3-LUC (3.2) at a m.o.i. of 50 p.f.u. per cell. At 

30 indicated time points post-infection, virus- infected 
cells were harvested and assayed in duplicate for 
luciferase activity. 

Figure 14 . Luciferase expression in the 
presence of 1-0-D-arabinof luranosyl cytosine (AraC) in 

35 MDBK cells-infected with BAV3-Luc. Confluent MDBK 
cell monolayers in 25 mm multi-well culture plates 
were infected with A) BAV3-Luc (3.1) or B) BAV3-Luc 
(3.2) at a m.o.i. of 50 p.f.u. per cell and incubated 
in the absence or presence of 50 jig AraC per ml of 
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maintenance medium. At indicated time points post- 
infection, virus-infected cells were harvested and 
assayed in duplicate for lucif erase activity. 

Figure 15 , Transcription maps of the wt 
5 BAV3 and BAV3— Luc genomes in the E3 region. The 
genome of wt BAV3 between m.u. 77 and 82 is shown 
which represents the E3 region. The location of Xhol 
and Ncol sites which were used to make an E3 deletion 
are shown. (a) The three frames (Fl, F2 and F3) 
10 representing the open reading frames (ORFs) in the 
upper strand of the wt BAV3 genome in the E3 region 
are represented by bars. The shaded portions indicate 
regions of similarities to pVIII and E3-14.7 kDa 
proteins of HAd5. The positions of the initiation and 
15 termination codons for ORFs likely to code for viral 
proteins are shown by open and closed triangles, 
respectively. (b) The predicted ORFs for the upper 
strand in E3 of the BAV3-Luc genome are shown after a 
696 bp Xhol-Ncol E3 deletion replaced by the 
20 lucif erase gene. The ORFs for pVIII and E3-14.7 kDa 
proteins are intact. The transcription map of the wt 
BAV3 E3 was adapted from the DNA sequence submitted to 
the GenBank database under accession number D16839. 

Figure 16 . Western blot analysis of virus- 
25 infected MDBK cells using an anti-lucif erase antibody. 
Confluent monolayers of MDBK cells were mock- infected 
(lane 1) or infected with the wt BAV3 (lane 2), BAV3- 
Luc (3.1) (lane 3) and BAV3-Luc (3.2) (lane 4) at a 
m.o.i. of 50 p.f-u. per cell, harvested at 18 h post- 
30 infection, cell extracts prepared and analyzed by SDS- 
PAGE and Western blotting using a rabbit anti- 
luciferase antibody. Purified firefly luciferase was 
used as a positive control (lane 5) . The lane 5 was 
excised to obtain a shorter exposure. The protein 
35 molecular weight markers in kDa are shown on the left. 
The arrow indicates the 62 kDa luciferase bands 
reacted with the anti-lucif erase antibody, 
wt: wild-type BAV3, 3.1: BAV3-Luc (3.1) and 3.2: BAV3- 
Luc (3.2). 
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Fiaure 17 , Constiruction of pSM71-neo. A 
8.4 kb Sail fragment of the BAV3 genome which falls 
between m.u. 0 and 24 was isolated and inserted into 
pUC19 at the Sall-Smal site to generate pSM71. The 
5 plasmid, pRSDneo (Fitzpatrick et al (1990) Virology 

176 ; 145-157) contains the neomycin-resistant (neo r gene 
flanked with the simian virus 40 (SV40) regulatory 
sequences originally from the plasmid, pSV2neo 
(Southern et al (1982) J. Mol. ApdL Genet 1:327-341) 

10 after deleting a portion of the SV40 sequences 

upstream of the neo r gene to remove several false 
initiation codons. A 2.6 kb fragment containing the 
neo r gene under the control of the SV40 regulatory 
sequences, was obtained from the plasmid, pRSDneo 

15 after digestion with BamHI and Bglll, and cloned into 
pSM71 at the Sail site by blunt end ligation to obtain 
pSM71-neo- containing the neo r gene in the El parallel 
orientation. 

Figure 18 . Construction of pSM61-kan 1 and 

20 pSM61-kan2. A 11.9 kb Bglll fragment of the BAV3 
genome which extends between m.u. 0 and 34 was 
purified and introduced into pUC19 at the BamHI-HincII 
site to obtain pSM61. The plasmid, pKN30 contains the 
neo r gene along with SV40 promoter and polyadenylation 

25 sequences from the plasmid pSV2neo without any 

modification. The entire pKN30 plasmid was inserted 
into pSM61 at the Sail site to generate pSM61-kanl 
having the neo r gene in the El anti-parallel 
orientation and pSM6l-kan2 when the neo r gene is in the 

30 El parallel orientation. 

Figure 19 . Construction of an El transfer 
plasmid containing the beta-galactosidase gene. 

The plasmid, pSM71 which contains the BAV3 
genome between m.u. 0 and 24, was cleaved with Clal 

35 and partially with AVrll to delete a 2.6 kb Avrll-Clal 
fragment (between m.u. 1.3 and 8.7) which falls within 
the El region. A 0.5 kb fragment containing the SV40 
promoter and polyadenylation sequences was obtained 
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from pFG144K5-SV by digesting with Xbal and inserted 
into pSM71 to replace the 2.6 kb deletion to generate 
pSM71-dell-SV. A 3.26 kb fragment containing the 
bacterial beta-galactosidase gene was isolated from 
5 pDUC/Z (Liang et al (1993) Virology 195:42-50) after 
cleavage with Ncol and Hindlll and cloned into pSM71- 
dell-SV at the BamHI site to put the beta- 
galactosidase gene under the control of the SV40 
regulatory sequences to obtain pSM71-Z. 

10 

Modes of Carrying Out the Invention 

The practice of the present invention will 
employ, unless otherwise indicated, conventional 
microbiology, immunology, virology, molecular biology, 

15 and recombinant DNA techniques which are within the 
skill of the art* These techniques are fully 
explained in the literature. See, e.g. . Maniatis et 
al., Molecular Cloning: A Laboratory Manual (1982); 
DNA Cloning: A Practical Approach , vols. I & II (D. 

20 Glover, ed.); Oligonucleotide Synthesis (N. Gait, ed. 
(1984)); Nucleic Acid Hybridization (B. Hames & S. 
Higgins, eds. (1985)); Transcription and Translation 
(B. Hames & S. Higgins, eds. (1984)); Animal Cell 
Culture (R. Freshney, ed. (1986)); Perbal, A Practical 

25 Guide to Molecular Cloning (1984). Sambrook et al., 

Molecular Cloning: A Laboratory Manual (2nd Edition) ; 
vols. I, II & III (1989). 

A. Definitions 
30 In describing the present invention, the 

following terminology, as defined below, will be used. 

A "replicon" is any genetic element (e.g., 
plasmid, chromosome, virus) that functions as an 
35 autonomous unit of DNA replication in vivo; i.e., is 
capable of replication under its own control. 

A "vector" is a replicon, such as a plasmid, 
phage, cosmid or virus, to which another DNA segment 
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may be attached so as to bring about the replication 
of the attached segment. 

By "live virus" is meant, in 
contradistinction to "killed" virus, a virus which is 
5 capable of producing identical progeny in tissue 
culture and inoculated animals. 

A "helper-free virus vector" is a vector 
that does not require a second virus or a cell line to 
supply something defective in the vector. 

10 A "double-stranded DNA molecule" refers to 

the polymeric form of deoxyribonucleotides (adenine, 
guanine, thymine, or cytosine) in its normal, double- 
stranded helix. This term refers only to the primary 
and secondary structure of the molecule, and does not 

15 limit it to any particular tertiary forms. Thus, this 
term includes double-stranded DNA found, inter alia, 
in linear DNA molecules (e.g., restriction fragments 
of DNA from viruses, plasmids, and chromosomes). In 
discussing the structure of particular double-stranded 

20 DNA molecules, sequences may be described herein 

according to the normal convention of giving only the 
sequence in the 5' to 3' direction along the 
nontranscribed strand of DNA (i.e., the strand having 
the sequence homologous to the mRNA) . 

25 A DNA "coding sequence" is a DNA sequence 

which is transcribed and translated into a polypeptide 
in vivo when placed under the control of appropriate 
regulatory sequences. The boundaries of the coding 
sequence are determined by a start codon at the 5' 

30 (amino) terminus and a translation stop codon at the 

3' (carboxy) terminus. A coding sequence can include, 
but is not limited to, procaryotic sequences, cDNA 
from eucaryotic mRNA, genomic DNA sequences from 
eucaryotic (e.g., mammalian) DNA, viral DNA, and even 

35 synthetic DNA sequences. A polyadenylation signal and 
transcription termination sequence will usually be 
located 3' to the coding sequence. 

A "transcriptional promoter sequence" is a 
DNA regulatory region capable of binding RNA 
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polymerase in a cell and initiating transcription of a 
downstream (3' direction) coding sequence. For 
purposes of defining the present invention, the 
promoter sequence is bound at the 3' terminus by the 
5 translation start codon (ATG) of a coding sequence and 
extends upstream (5' direction) to include the minimum 
number of bases or elements necessary to initiate 
transcription at levels detectable above background. 
Within the promoter sequence will be found a 

10 transcription initiation site (conveniently defined by 
mapping with nuclease SI) , as well as protein binding 
domains (consensus sequences) responsible for the 
binding of RNA polymerase. Eucaryotic promoters will 
often , but not always, contain "TATA" boxes and "CAAT" 

15 boxes. Procaryotic promoters contain Shine-Dalgarno 
sequences in addition to the -10 and -35 consensus 
sequences. 

DNA "control sequences" refer collectively 
to promoter sequences, ribosome binding sites, 
20 polyadenylation signals, transcription termination 

sequences, upstream regulatory domains, enhancers, and 
the like, which collectively provide for the 
transcription and translation of a coding sequence in 
a host cell. 

25 A coding sequence or sequence encoding is 

"operably linked to" or "under the control of" control 
sequences in a cell when RNA polymerase will bind the 
promoter sequence and transcribe the coding sequence 
into mRNA, which is then translated into the 

30 polypeptide encoded by the coding sequence. 

A "host cell" is a cell which has been 
transformed, or is capable of transformation, by an 
exogenous DNA sequence. 

A cell has been "transformed" by exogenous 

35 DNA when such exogenous DNA has been introduced inside 
the cell membrane. Exogenous DNA may or may not be 
integrated (covalently linked) to chromosomal DNA 
making up the genome of the cell. In procaryotes and 
yeasts, for example, the exogenous DNA may be 
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maintained on an episomal element, such as a plasmid. 
A stably transformed cell is one in which the 
exogenous DNA has become integrated into the 
chromosome so that it is inherited by daughter cells 
5 through chromosome replication. For mammalian cells, 
this stability is demonstrated by the ability of the 
cell to establish cell lines or clones comprised of a 
population of daughter cell containing the exogenous 
DNA. 

10 A "clone" is a population of daughter cells 

derived from a single cell or common ancestor. A 
"cell line" is a clone of a primary cell that is 
capable of stable growth in vitro for many 
generations. 

15 Two polypeptide sequences are "substantially 

homologous" when at least about 80% (preferably at 
least about 90%, and most preferably at least about 
95%) of the amino acids match over a defined length of 
the molecule. 

20 Two DNA sequences are "substantially 

homologous" when they are identical to or not 
differing in more that 40% of the nucleotides, more 
preferably about 20% of the nucleotides, and most 
preferably about 10% of the nucleotides. 

25 DNA sequences that are substantially 

homologous can be identified in a Southern 
hybridization experiment under, for example, stringent 
conditions, as defined for that particular system. 
Defining appropriate hybridization conditions is 

30 within the skill of the art. See, e.g. . Maniatis et 
al., supra; DNA Cloning, vols. I & II, supra : Nucleic 
Acid Hybridization , supra . 

A "heterologous" region of a DNA construct 
is an identifiable segment of DNA within or attached 

35 to another DNA molecule that is not found in 

association with the other molecule in nature. Thus, 
when the heterologous region encodes a viral gene, the 
gene will usually be flanked by DNA that does not 
flank the viral gene in the genome of the source virus 
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or virus-infected cells. Another example of the 
heterologous coding sequence is a construct where the 
coding sequence itself is not found in nature (e.g., 
synthetic sequences having codons different from the 
5 native gene) . Allelic variation or naturally 

occurring mutational events do not give rise to a 
heterologous region of DNA, as used herein. 

"Bovine host" refers to cattle of any breed, 
adult or infant. 

10 The term "protein" is used herein to 

designate a polypeptide or glycosylated polypeptide, 
respectively, unless otherwise noted. The term 
"polypeptide" is used in its broadest sense, i.e., any 
polymer of amino acids (dipeptide or greater) linked 

15 through peptide bonds. Thus, the term "polypeptide" 
includes proteins, oligopeptides, protein fragments, 
analogs, muteins, fusion proteins and the like. 

"Fusion protein" is usually defined as the 
expression product of a gene comprising a first region 

20 encoding a leader sequence or a stabilizing 
polypeptide, and a second region encoding a 
heterologous protein. It involves a polypeptide 
comprising an antigenic protein fragment or a full 
length BAV protein sequence as well as (a) 

25 heterologous sequence(s), typically a leader sequence 
functional for secretion in a recombinant host for 
intracellular ly expressed polypeptide, or an N- 
terminal sequence that protects the protein from host 
cell proteases, such as SOD. An antigenic protein 

30 fragment is usually about 5-7 amino acids in length. 

"Native" proteins or polypeptides refer to 
proteins or polypeptides recovered from BAV or BAV- 
infected cells. Thus, the term "native BAV 
polypeptide" would include naturally occurring BAV 

35 proteins and fragments thereof. "Non-native" 

polypeptides refer to polypeptides that have been 
produced by recombinant DNA methods or by direct 
synthesis. "Recombinant" polypeptides refers to 
polypeptides produced by recombinant DNA techniques; 
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i.e., produced from cells transformed by an exogenous 
DNA construct encoding the desired polypeptide. 

A "substantially pure" protein will be free 
of other proteins, preferably at least 10% 
5 homogeneous, more preferably 60% homogeneous, and most 
preferably 95% homogeneous. 

An "antigen" refers to a molecule containing 
one or more epitopes that will stimulate a host's 
immune system to make a humoral and/or cellular 

10 antigen-specific response. The term is also used 
interchangeably with "immunogen. " 

A "hapten" is a molecule containing one or 
more epitopes that does not stimulate a host's immune 
system to make a humoral or cellular response unless 

15 linked to a carrier. 

The term "epitope" refers to the site on an 
antigen or hapten to which a specific antibody 
molecule binds or is recognized by T cells. The term 
is also used interchangeably with "antigenic 

20 determinant" or "antigenic determinant site." 

An "immunological response" to a composition 
or vaccine is the development in the host of a 
cellular and/ or antibody-mediated immune response to 
the composition or vaccine of interest. Usually, such 

25 a response consists of the subject producing 

antibodies, B cells, helper T cells, suppressor T 
cells, and/ or cytotoxic T cells directed specifically 
to an antigen or antigens included in the composition 
or vaccine of interest. 

30 The terms "immunogenic polypeptide" and 

"immunogenic amino acid sequence" refer to a 
polypeptide or amino acid sequence, respectively, 
which elicit antibodies that neutralize viral 
infectivity, and/or mediate antibody-complement or 

35 antibody dependent cell cytotoxicity to provide 
protection of an immunized host. An "immunogenic 
polypeptide" as used herein, includes the full length 
(or near full length) sequence of the desired protein 
or an immunogenic fragment thereof. 
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By "immunogenic fragment" is meant a 
fragment of a polypeptide which includes one or more 
epitopes and thus elicits antibodies that neutralize 
viral infectivity, and/ or mediates antibody-complement 
5 or antibody dependent cell cytotoxicity to provide 

protection of an immunized host. Such fragments will 
usually be at least about 5 amino acids in length, and 
preferably at least about 10 to 15 amino acids in 
length. There is no critical upper limit to the 

10 length of the fragment, which could comprise nearly 
the full length of the protein sequence, or even a 
fusion protein comprising fragments of two or more of 
the antigens. The term "treatment" as used herein 
refers to treatment of a mammal, such as bovine or the 

15 like, either (i) the prevention of infection or 

reinfection (prophylaxis) , or (ii) the reduction or 
elimination of symptoms of an infection. The vaccine 
comprises the recombinant BAV itself or recombinant 
antigen produced by recombinant BAV. 

20 By "infectious" is meant having the capacity 

to deliver the viral genome into cells. 

B. General Method 

The present invention identifies and 

25 provides a means of deleting part or all of the 

nucleotide sequence of bovine adenovirus El and/ or E3 
gene regions to provide sites into which heterologous 
or homologous nucleotide sequences encoding foreign 
genes or fragments thereof can be inserted to generate 

30 bovine adenovirus recombinants. By "deleting part of" 
the nucleotide sequence is meant using conventional 
genetic engineering techniques for deleting the 
nucleotide sequence of part of the El and/ or E3 
region. 

35 Various foreign genes or coding sequences 

(prokaryotic, and eukaryotic) can be inserted in the 
bovine adenovirus nucleotide sequence, e.g., DNA , in 
accordance with the present invention, particularly to 
provide protection against a wide range of diseases 
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and many such genes are already known in the art. The 
problem heretofore having been to provide a safe, 
convenient and effective vaccine vector for the genes 
or coding sequences. 
5 It is also possible that only fragments of 

nucleotide sequences of genes can be used (where these 
are sufficient to generate a protective immune 
response) rather than the complete sequence as found 
in the wild-type organism. Where available, synthetic 

10 genes or fragments thereof can also be used. However, 
the present invention can be used with a wide variety 
of genes, fragment and the like, and is not limited to 
those set out above. 

In some cases the gene for a particular 

15 antigen can contain a large number of introns or can 
be from an RNA virus, in these cases a complementary 
DNA copy (cDNA) can be used. 

In order for successful expression of the 
gene to occur, it can be inserted into an expression 

20 vector together with a suitable promoter including 
enhancer elements and polyadenylation sequences. A 
number of eucaryotic promoter and polyadenylation 
sequences which provide successful expression of 
foreign genes in mammalian cells and how to construct 

25 expression cassettes, are known in the art, for 

example in U.S. patent 5,151,267, the disclosures of 
which are incorporated herein by reference. The 
promoter is selected to give optimal expression of 
immunogenic protein which in turn satisfactorily leads 

30 to humoral, cell mediated and mucosal immune responses 
according to known criteria. 

The foreign protein produced by expression 
In vivo in a recombinant virus-infected cell may be 
itself immunogenic. More than one foreign gene can be 

35 inserted into the viral genome to obtain successful 
production of more than one effective protein. 

Thus with the recombinant virus of the 
present invention, it is possible to provide 
protection against a wide variety of diseases 
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affecting cattle. Any of the recombinant antigenic 
determinant or recombinant live virus of the invention 
can be formulated and used in substantially the same 
manner as described for the antigenic determinant 
5 vaccines or an live vaccine vectors. 

The antigens used in the present invention 
can be either native or recombinant antigenic 
polypeptides or fragments. They can be partial 
sequences, full-length sequences, or even fusions 

10 (e.g., having appropriate leader sequences for the 
recombinant host, or with an additional antigen 
sequence for another pathogen) . The preferred 
antigenic polypeptide to be expressed by the virus 
systems of the present invention contain full-length 

15 (or near full-length) sequences encoding antigens. 
Alternatively, shorter sequences that are antigenic 
(i.e., encode one or more epitopes) can be used. The 
shorter sequence can encode a "neutralizing epitope," 
which is defined as an epitope capable of eliciting 

20 antibodies that neutralize virus infectivity in an in 
vitro assay. Preferably the peptide should encode a 
"protective epitope" that is capable of raising in the 
host an "protective immune response;" i.e., an 
antibody- and/ or a cell-mediated immune response that 

25 protects an immunized host from infection. 

The antigens used in the present invention, 
particularly when comprised of short oligopeptides, 
can be conjugated to a vaccine carrier. Vaccine 
carriers are well known in the art: for example, 

30 bovine serum albumin (BSA) , human serum albumin (USA) 
and keyhole limpet hemocyanin (KLH) . A preferred 
carrier protein, rotavirus VP6, is disclosed in EPO 
Pub. No. 0259149, the disclosure of which is 
incorporated by reference herein. 

35 Genes for desired antigens or coding 

sequences thereof which can be inserted include those 
of organisms which cause disease in mammals 
particularly bovine pathogens such as bovine 
rotavirus, bovine corona virus, bovine herpes virus 
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type 1, bovine respiratory syncytial virus, bovine 
parainfluenza virus type 3 (BPI-3) , bovine diarrhea 
virus, Pasteurella haemolvtica . Haemophilus somnus and 
the like. The vaccines of the invention carrying 
5 foreign genes or fragments can also be orally 

administered in a suitable oral carrier, such as in an 
enteric-coated dosage form. Oral formulations include 
such normally-employed excipients as, for example, 
pharmaceutical grades of mannitol, lactose, starch, 
10 magnesium stearate, sodium saccharin cellulose, 
magnesium carbonate, and the like. Oral vaccine 
compositions may be taken in the form of solutions, 
suspensions, tablets, pills, capsules, sustained 
release formulations, or powders, containing from 
15 about 10% to about 95% of the active ingredient, 

preferably about 25% to about 70%. An oral vaccine 
may be preferable to raise mucosal immunity in 
combination with systemic immunity, which plays an 
important role in protection against pathogens 
20 infecting the gastrointestinal tract. 

In addition, the vaccine be formulated into 
a suppository. For suppositories, the vaccine 
composition will include traditional binders and 
carriers, such as polyalkaline glycols or 
25 triglycerides. Such suppositories may be formed from 
mixtures containing the active ingredient in the range 
of about 0.5% to about 10% (w/w) , preferably about 1% 
to about 2%. 

Protocols for administering to animals the 
30 vaccine composition (s) of the present invention are 
within the skill of the art in view of the present 
disclosure. Those skilled in the art will select a 
concentration of the vaccine composition in a dose 
effective to elicit an antibody and/or T-cell mediated 
35 immune response to the antigenic fragment. Within 
wide limits, the dosage is not believed to be 
critical. Typically, the vaccine composition is 
administered in a manner which will deliver between 
about 1 to about 1,000 micrograms of the subunit 
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antigen in a convenient volume of vehicle, e.g., about 
1-10 cc. Preferably, the dosage in a single 
immunization will deliver from about 1 to about 500 
micrograms of subunit antigen, more preferably about 
5 5-10 to about 100-200 micrograms (e.g., 5-200 
micrograms) . 

The timing of administration may also be 
important. For example, a primary inoculation 
preferably may be followed by subsequent booster 

10 inoculations if needed. It may also be preferred, 
although optional, to administer a second, booster 
immunization to the animal several weeks to several 
months after the initial immunization. To insure 
sustained high levels of protection against disease, 

15 it may be helpful to readminister a booster 

immunization to the animals at regular intervals, for 
example once every several years. Alternatively, an 
initial dose may be administered orally followed by 
later inoculations, or vice versa. Preferred 

20 vaccination protocols can be established through 
routine vaccination protocol experiments. 

The dosage for all routes of administration 
of in vivo recombinant virus vaccine depends on 
various factors including, the size of patient, nature 

25 of infection against which protection is needed, 

carrier and the like and can readily be determined by 
those of skill in the art. By way of non-limiting 
example, a dosage of between 10 3 pfu and 10 6 pfu and 
the like can be used. As with in vitro subunit 

30 vaccines, additional dosages can be given as 
determined by the clinical factors involved. 

In one embodiment of the invention, a number 
of recombinant cell lines are produced according to 
the present invention by constructing an expression 

35 cassette comprising the BAV El region and transforming 
host cells therewith to provide cell lines or cultures 
expressing the El proteins. These recombinant cell 
lines are capable of allowing a recombinant BAV, 
having an El gene region deletion replaced by 
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heterologous nucleotide sequence encoding for a 
foreign gene or fragment, to replicate and express the 
desired foreign gene or fragment thereof which is 
encoded within the recombinant BAV. These cell lines 
5 are also extremely useful in generating recombinant 
BAV, having an E3 gene deletion replaced by 
heterologous nucleotide sequence encoding for a 
foreign gene or fragment, by in vivo recombination 
following DNA-mediated cotransfection. 

10 In one embodiment of the invention, the 

recombinant expression cassette can be obtained by 
cleaving the wild-type BAV genome with an appropriate 
restriction enzyme to produce a DNA fragment 
representing the left end or the right end of the 

15 genome comprising El or E3 gene region sequences, 
respectively and inserting the left or right end 
fragment into a cloning vehicle, such as plasmid and 
thereafter inserting at least one DNA sequence 
encoding a foreign protein, into El or E3 deletion 

20 with or without the control of an exogenous promoter. 
The recombinant expression cassette is contacted with 
the wild-type BAV DNA through homologous recombination 
or other conventional genetic engineering method 
within an El transformed cell line to obtain the 

25 desired recombinant. 

The invention also includes an expression 
system comprising an bovine adenovirus expression 
vector wherein a heterologous nucleotide, e.g. DNA, 
replaces part or all of the E3 region and/or part or 

30 all of the El region. The expression system can be 
used wherein the foreign nucleotide sequences, e.g. 
DNA, is with or without the control of any other 
heterologous promoter. 

The BAV El gene products of the adenovirus 

35 of the invention transact ivate most of the cellular 

genes, and therefore, cell lines which constitutively 
express El proteins can express cellular polypeptides 
at a higher level than normal cell lines. The 
recombinant mammalian, particularly bovine, cell lines 
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of the invention can be used to prepare and isolate 
polypeptides, , including those such as (a) proteins 
associated with adenovirus E1A proteins: e.g. p300, 
retinoblastoma (Rb) protein, cyclins, kinases and the 
5 like.; (b) proteins associated with adenovirus E1B 
protein: e.g. p53 and the like.; (c) growth factors, 
such as epidermal growth factor (EGF) , transforming 
growth factor (TGF) and the like; (d) receptors such 
as epidermal growth factor receptor (EGF-R) , 

10 fibroblast growth factor receptor (FGF-R) , tumor 

necrosis factor receptor (TNF-R) , insulin-like growth 
factor receptor (IFG-R) , major histocompatibility 
complex class I receptor and the like; (e) proteins 
encoded by proto-oncogenes such as protein kinases 

15 (tyrosine-specif ic protein kinases and protein kinases 
specific for serine or threonine) , p21 proteins 
(guanine nucleotide-binding proteins with GTPase 
activity and the like; (f) other cellular proteins 
such as actins, collagens, f ibronectins, integrins, 

20 phospholipids, proteoglycans, histones and the like, 
and (g) proteins involved in regulation of 
transcription such as TATA-box-binding protein (TBP) , 
TBP-associated factors (TAFs) . SP1 binding protein and 
the like. 

25 The invention also includes a method for 

providing gene therapy to a mammal in need thereof to 
control a gene deficiency which comprises 
administering to said mammal a live recombinant bovine 
adenovirus containing a foreign nucleotide sequence 

30 encoding a non-defective form of said gene under 

conditions wherein the recombinant virus vector genome 
is incorporated into said mammalian genome or is 
maintained independently and extrachromosomally to 
provide expression of the required gene in the target 

35 organ or tissue. These kinds of techniques are 

recently being used by those of skill in the art to 
replace a defective gene or portion thereof. Examples 
of foreign genes nucleotide sequences or portions 
thereof that can be incorporated for use in a 
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conventional gene therapy include, cystic fibrosis 
transmembrane conductance regulator gene, human 
minidystrophin gene, alphal-antitrypsin gene and the 
like. 

5 

Examples 

Described below are examples of the present 
invention. These examples are provided only for 
illustrative purposes and are not intended to limit 

10 the scope of the present invention in any way. In 

light of the present disclosure, numerous embodiments 
within the scope of the claims will be apparent to 
those of ordinary skill in the art. The contents of 
the references cited in the specification are 

15 incorporated by reference herein. 

Cells and viruses 

Cell culture media and reagents were 
obtained from GIBCO/BRL Canada (Burlington, Ontario, 

20 Canada) . Media were supplemented with 25 mM Hepes and 
50 fig /ml gentamicin. MDBK cells or MDBK cells 
transformed with a plasmid containing BAV3 El 
sequences were grown in MEM supplemented with 10% 
Fetal bovine serum. The wild-type BAV3 ((strain WBR- 

25 1) (Darbyshire et al, 1965 J. Comparative Pathology 
25:327) was kindly provided by Dr. B. Darbyshire, 
University of Guelph, Guelph, Canada) and BAV3- 
lucif erase recombinants working stocks and virus 
titrations were done in MDBK cells. 

30 

Enzymes, bacteria and plasmids 

Restriction endonucleases , polymerase chain 
reaction (PGR) and other enzymes required for DNA 
manipulations were purchased from Pharmacia LKB 
35 Biotechnology (Canada) Ltd. (Dorval, Quebec, Canada), 
Boehringer-Mannheim, Inc. (Laval or Montreal, Quebec, 
Canada) , New England BioLabs (Beverly, MA) , or 
GIBCO/BRL Canada (Burlington, Ontario, Canada) and 
used as per manufacturer's instructions. Restriction 
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enzyme fragments of BAV3 DNA were inserted into pUC18 
or pUC19 (Yanich-Penon et al (1985) Gene 33:103-109) 
following standard procedures (Sambrook et al (1989) 
Molecular Cloning: A Laboratory Manual, 2nd ed. Cold 
5 Spring Harbour Laboratory, New York) . E. coli strain 
DH5 (supE44 hsdRH recAl endAl gyrA96 thi-1 relAl) was 
transformed with recombinant plasmids by 
electroporation (Dower et al. (1988) Nuc. Acids Res. . 
16:6127-6145). Plasmid DNA was prepared using the 

10 alkaline lysis procedure (Bernboim and Doly (1978) 

Nuc. Acids Res. , 7:1513-1523). The plasmid, pSVOA/L 
containing the entire cDNA encoding firefly luciferase 
(de Wet et al (1987) Mol. Cell. Biol. 7:725-737), was 
a gift from D.R. Helinski, University of California, 

15 San Diego, La Jolla, CA. 

Construction of recombinant BAV3 

MDBK cells transformed with a plasmid 
containing BAV3 El sequences were cotransf ected with 

20 the wt BAV3 DNA digested with Pvul and the plasmid, 
pSM51-Luc (Figs. 9 and 10) using the lipofection- 
mediated cotransf ection protocol (GIBCO/BRL, Life 
Technologies, Inc., Grand Island, NY). The virus 
plaques produced following cotransf ection were 

25 isolated, plaque purified and the presence of the 
luciferase gene in the BAV3 genome was detected by 
agarose gel electrophoresis of recombinant virus DNA 
digested with appropriate restriction enzymes. 

30 Southern blot and hybridization 

Mock or virus-infected MDBK cells were 
harvested in lysis buffer (500 pg/ml pronase in 0.01 M 
Tris, pH 7.4, 0.01 M EDTA, 0.5% SDS) and DNA was 
extracted (Graham et al (1991) Manipulation of 

35 adenovirus vectors In: Methods and Molecular Biology, 
7: Gene Transfer and Expression Techniques (Eds. Murray 
and Walker) Humana Press, Clifton, N.J. pp. 109-128) . 
100 ng DNA was digested either with BamHI, EcoRI or 
Xbal and resolved on a 1% agarose gel by 



WO 95/16048 



PCT/CA94/00678 



-33- 

electrophoresis. DNA bands from the agarose gel were 
transferred to a GeneScreenPlus'" 1 membrane (Du Pont 
Canada Inc. (NEN Products) , La chine, Quebec, Canada) 
by the capillary blot procedure (Southern, E.M. (1975) 
5 J. Mol. Biol. 98:503-517). Probes were labeled with 
32 P using an Oligolabeling Kit (Pharmacia LKB 
Biotechnology (Canada) Ltd., Dorval, Quebec, Canada) 
and the unincorporated label was removed by passing 
the labeled probe through a sephadex G-50 column 
10 (Sambrook et al (1989) supra) . Probes were kept in a 
boiling water bath for 2 min and used in hybridization 
experiments following GeneScreenPlus' m hybridization 
protocol. The DNA bands which hybridized with the 
probe were visualized by autoradiography. 

15 

Luciferase assays 

The protocol was essentially the same as 
described (Mittal et al (1993) Virus Res. 28:67-90). 
Briefly, MDBK cell monolayers in 25 mm multi-well 

20 dishes (Corning Glass Works, Corning, NY) were 

infected in duplicate either with BAV3-Luc (3.1) or 
BAV3 -Luc (3.2) at a m.o.i. of 50 p.f.u. per cell. At 
indicated time points postinfection, recombinant 
virus-infected cell monolayers were washed once with 

25 PBS (0.137 M NaCl, 2.7 mM KC1, 8 mM Na 2 HP0 4 , 1.5 mM 
KH 2 P0 4 ) and harvested in 1 ml luciferase extraction 
buffer (100 mM potassium phosphate, pH 7.8, 1 mM 
dithiothreitol) . The cell pellets were resuspended in 
200 /il of luciferase extraction buffer and lysed by 

30 three cycles of freezing and thawing. The 

supernatants were assayed for luciferase activity. 
For the luciferase assay, 20 /il of undiluted or 
serially diluted cell extract was mixed with 350 Ml of 
luciferase assay buffer (25 mM glycylglycine, pH 7.8, 

35 is mM MgCl 2 , 5 mM ATP) in a 3.5 ml tube (Sarstedt Inc., 
St-Laurent, Quebec, Canada) . Up to 48 tubes can be 
kept in the luminometer rack and the equipment was 
programed to inject 100 /j1 of luciferin solution (1 mM 
luciferin in 100 mM potassium phosphate buffer, pH 
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7.8) in the tube present in the luminometer chamber to 
start the enzyme reaction. The Luminometer (Packard 
Picolite Luminometer, Packard Instrument Canada, Ltd., 
Mississauga, Ontario, Canada) used in the present 
5 study produced 300 to 450 light units of background 

count in a 10 sec reaction time. Known amounts of the 
purified firefly luciferase were used in luciferase 
assays to calculate the amount of active luciferase 
present in each sample. 

10 

Western blotting 

Mock or virus-infected MDBK cells were lysed 
in 1:2 diluted 2X loading buffer (80 mM Tris-HCl, pH 
6.8, 0.67 M urea, 25% glycerol, 2.5% SDS, 1 M 

15 mercaptoethanol, 0.001% bromophenol blue), boiled for 
3 min and then centrifuged to pellet cell debris. 
Proteins were separated by SDS-polyacrylamide gel 
electrophoresis (SDS-PAGE) on 0.1% SDS-10% 
polyacrylamide gels (Laemmli, et al (1970) Nature 

20 227:680-685). After the end of the run, polypeptide 
bands in the gel were electrophoretically transferred 
to a nitrocellulose membrane (Bio-Rad Laboratories, 
Richmond, CA) . The membrane was incubated at room 
temperature for 2 h with 1:4000 diluted rabbit anti- 

25 luciferase antibody (Mittal et al (1993) supra) . The 
binding of anti-luciferase antibody to the specific 
protein band/s on the membrane was detected with 
1:5000 diluted horseradish peroxidase conjugated-goat 
anti-rabbit IgG (Bio-Rad Laboratories, Richmond, CA) 

30 and with an ECL Western blotting detection system 
(Amersham Canada Ltd., Oakville, Ontario). 

Example 1 Cloning of BAV3 El Region DNA for sequencing 
To complement the restriction site (Kurokawa 
35 et al, 1978 J. Virol. . 28:212-218; Hu et al, 1984 

Virol . 49:604-608) other restriction enzyme sites in 
the BAV3 genome were defined. The 8.4 kilobase pair 
(kb) Sail B fragment which extends from the left end 
of the genome to approximately 24% was cloned into the 
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Smal-Sall sites of pUC18 essentially as described 
previously (Graham et al, 1989 EMBO Journal 8:2077- 
2085). Beginning at the left end of the BAV3 genome , 
the relevant restriction sites used for subsequent 
5 subcloning and their approximate positions are: SacI 
(2%), EcoRI (3.5%), Hindlll (5%), SacI (5.5%) , Smal 
(5.6%) and Hindlll (11%). Through the use of 
appropriate restriction enzymes, the original plasmid 
was collapsed to contain smaller inserts which could 

10 be sequenced using the pUC universal primers. Some 
fragments were also subcloned in both pUC18 and pUC19 
to allow conf irmational sequencing in both directions. 
These procedures, together with the use of twelve 
different oligonucleotide primers hybridizing with 

15 BAV3 sequences, allowed to sequence the BAV3 genome 
from its left end to the Hindlll site at 11%. 

To ensure that some features of the sequence 
obtained were not unique to the initial clone selected 
for sequencing, two more pUC19 clones were prepared 

20 containing the Sail fragment from a completely 

independent DNA preparation. These clones were used 
to confirm the original sequence for the region from 
approximately 3% to 5.5% of the BAV3 genome. 

DNA sequencing reactions were based on the 

25 chain-termination method (Sanger et al. 1977 PNAS . USA 
74:5463-5467) and manual sequencing followed the DNA 
sequencing protocol described in the Sequenase™ kit 
produced by US Biochemical. [<*- 35 S]dATPs was obtained 
from Amersham Canada Ltd. All oligonucleotides used 

30 as primers were synthesized by the Central Facility of 
the Molecular Biology and Biotechnology Institute 
(MOBIX) at McMaster University, Hamilton, Ontario. 
The entire region (0 to 11%) of the BAV3 genome was 
sequenced by at least two independent determinations 

35 for each position by automated sequencing on a 373A 
DNA Sequencer (Applied Biosystems) using Taq-Dye 
terminators. Over half of the region was further 
sequenced by manual procedures to confirm overlaps and 
other regions of interest. 
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DNA sequence analysis and protein 
comparisons were carried out on a MICROGENIE program. 

Example 2 Coding Sequences of the BAV3 El Region 
5 BAV3 genomic DNA, from the left end of the 

genome to the Hindlll site at approximately 11%, was 
cloned into plasmids and sequenced by a combination of 
manual and automated sequencing. An examination of 
the resultant BAV3 El genomic sequence (Fig 1) 

10 revealed a number of interesting features relevant 
both to transactivation and to other functions 
associated with adenovirus El proteins. On the basis 
of open reading frames (ORFs) it was possible to 
assign potential coding regions analogous to those 

15 defined in human Ad5 (HAd5) • As shown in Fig 1, ORFs 
corresponding roughly to the first exon and unique 
region of HAd5 E1A as well are ORFs corresponding to 
the 19k and 58k proteins of E1B and the ORF 
corresponding to protein IX were all defined in this 

20 sequence. The open reading frame defining the 

probable E1A coding region begins at the ATG at nt 606 
and continues to a probable splice donor site at 
position 1215. The first consensus splice acceptor 
site after this is located after nt 1322 and defines 

25 an intron of 107 base pairs with an internal consensus 
splice branching site at position 1292. The putative 
BAV3 E1A polypeptide encoded by a message 
corresponding to these splice sites would have 211 
amino acids and a unmodified molecular weight of 

30 23,323. The major homology of the protein encoded by 
this ORF and HAd5 E1A is in the residues corresponding 
to CR3 (shown in Fig 2) . The homology of amino acid 
sequences on both sides of the putative intron 
strengthens the assignment of probable splice donor 

35 and acceptor sites. The CR3 has been shown to be of 
prime importance in the transactivation activity of 
HAd5 EIA gene products. As seen in Fig. 2 A the 
homology of this sequence in the BAV3 protein to the 
corresponding region of the 289R EIA protein of HAd5 
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includes complete conservation of the 
CysX 2 CysX 13 CysX 2 Cys sequence motif which defines the 
metal binding site of this protein (Berg, 1986 Science 
232 :485-487) as well as conservation of a number of 
5 amino acids within this region and within the promoter 
binding region as defined by Lillie and Green 1989 
Nature 138:39-44) . 

The only other region of significant 
homology between the BAV3 E1A protein and that of HAd5 

10 was a stretch of amino acids known to be important in 
binding of the cellular Rb protein to the HAd5 E1A 
protein (Dyson et al, 1990 J. Virol. 64:1353-1356) . 
As shown in Fig 2B, this sequence, which is located 
between amino acids 120 and 132 in the CR2 region of 

15 HAd5 El A, is found near the amino (N-) terminus of the 
BAV3 protein between amino acids 26 and 37. 

An open reading frame from the ATG at nt 
1476 to the termination signal at 1947 defines a 
protein of 157 amino acids with two regions of major 

20 homology to the HAd5 E1B 19k protein. As shown in Fig 
3 both the BAV3 and the HAd5 proteins have a centrally 
located hydrophobic amino acid sequence. The sequence 
in BAV3, with substitutions of valine for alanine and 
leucine for valine, should result in a somewhat more 

25 hydrophobic pocket than the corresponding HAd5 region. 
The other portion of HAd5 19k that may be conserved in 
the BAV3 protein is the serine rich sequence found 
near the N-terminus (residues 20 to 26) in HAd5 19k 
and near the C-terminus (residues 136 to 142) in the 

30 BAV3 protein (also shown in Fig 3) . 

On ORF beginning at the ATG at nt 1850 and 
terminating at nt 3110 overlaps the preceding BAV3 
protein reading frame and thus has the same 
relationship to it as does the HAd5 E1B 56k protein to 

35 E1B 19k protein. As shown in Fig 4 this BAV3 protein 
of 420R and the corresponding HAd5 E1B 56k protein of 
496R show considerable sequence homology over their C- 
terminal 346 residues. The N-terminal regions of 
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these proteins (not depicted in the figure) show no 
significant homology and differ in overall length. 

Following the E1B ORFs, the open reading 
frame beginning at nt 3200 and ending at the 
5 translation terminator TAA at nt 3575 defines a 

protein of 125R with an unmodified molecular weight of 
13 , 706. As seen in Fig 5 this protein shares some 
homology with the structural protein IX of HAd5 
particularly in N-terminal sequences. 

10 

Possible Transcription Control Re gions in BAV3 El 

The inverted terminal repeats (ITR) at the 
ends of the BAV3 genome have been shown to extend to 
195 nt (Shinagawa et al, 1987 Gene 55:85-93). The GC- 

15 rich 3' portion of the ITR contains a number of 
consensus binding sites for the transcription 
stimulating protein SP1 (Dynan and Tijan (1983) Cell 
35:79-87) and possible consensus sites for the 
adenovirus transcription factor (ATF) (Lee et al. 

20 (1987) Nature 325:368-372) occur at nts 60 and 220. 
While there are no exact consensus sites for the 
factors EF-1A (Bruder and Healing (1989) Mol. Cell 
Biol. 9:5143-5153) or E2F (Kovesdi et al, 1987 PNAS. 
USA 84 2 2180-2184) upstream of the ATG at nt 606, there 

25 are numerous degenerate sequences which may define the 
enhancer region comparable to that seen in HAd5 
(Hearing and Shenk, 1986 Cell 45:229-236). 

The proposed BAV3 E1A coding sequence 
terminates at a TGA residue at nt 1346 which is 

30 located within a 35 base pair sequence which is 
immediately directly repeated (see Fig 1) . Two 
repeats of this sequence were detected in three 
independently derived clones for a plaque purified 
stock of BAV3. The number of direct repeats can vary 

35 in any BAV3 population though plaque purification 
allows for isolation of a relatively homogeneous 
population of viruses. That direct repeats in the 
sequences can function as promoter or enhancer 
elements for E1B transcription is being tested. There 
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are no strong polyA addition consensus sites between 
the E1A and the E1B coding sequences and in fact no 
AATAA sequence is found until after the protein IX 
coding sequences following E1B. The TATAAA sequence 
5 beginning at nt 1453 could function as the proximal 
promoter for E1B but it is located closer to the ATG 
at 1476 than is considered usual (McKnight et al, 1982 
Science 217:316-322). The TATA sequence located 
further upstream immediately before the proposed E1A 

10 intron sequence also seems inappropriately positioned 
to serve as a transcription box for the E1B proteins. 
There are clearly some unique features in this region 
of the BAV3 genome. 

The transcriptional control elements for the 

15 protein IX transcription unit are conventional and 

well defined. Almost immediately following the open 
reading frame for the larger E1B protein there is, at 
nt 3117, a SP1 binding sequence. This is followed at 
3135 by a TATAAAT sequence which could promote a 

20 transcript for the protein IX open reading frame 

beginning at the ATG at 3200 and ending with the TAA 
at 3575. One polyA addition sequence begins within 
the translation termination codon and four other AATAA 
sequences are located at nts 3612, 3664, 3796 and 

25 3932. 

In keeping with the general organization of 
the E1A region of other adenoviruses, the BAV3 E1A 
region contains an intron sequence with translation 
termination codons in all three reading frames and 

30 which is therefore probably deleted by splicing from 
all E1A mRNA transcripts. The largest possible 
protein produced from the BAV3 E1A region will have 
211 amino acid residues and is the equivalent of the 
289 amino acid protein translated from the 13s mRNA of 

35 HAd5. Two striking features in a comparison of these 
proteins are the high degree of homology in a region 
corresponding to CR3 and the absence in BAV3 of most 
of amino acids corresponding to the second exon of 
HAd5. In fact the only amino acids encoded in the' 



WO 95/16048 



PCT/CA94/00678 



second exon of BAV3 are, those which are considered to 
constitute part of CR3. A great deal of work carried 
out with HAd5 has identified the importance of the CR3 
sequences in transactivation of other HAd5 genes. 
5 While a detailed analysis of the corresponding BAV3 
region and its possible role in transactivation of 
BAV3 genes needs to be carried out, it is nonethe- 
less interesting to note a couple of possibly 
pertinent features. The HAd5 CR3 region has been 

10 operationally subdivided into three regions (Lillie et 
al, 1989 Nature 338:39-44; see Fig 8); an N-terminal 
region from 139 to 153 which has four acidic residues 
and is thought to be important in transcription 
activation , a central, metal-binding, region defined 

15 by the Cys-X 2 -Cys-X 13 -CysX 2 -Cys sequence which is 

essential for both promoter binding and activation, 
and a C-terminal region (residues 175-189) which is 
essential for promoter binding. Since, in most 
instances, E1A protein is thought not to interact 

20 directly with DNA (Ferguson et al 1985) , the promoter 
binding regions may be involved in forming 
associations with proteins which then allow 
association with DNA. In Fig 2a the BAV3 E1A protein 
contains the central, metal binding domain and has 

25 considerable homology in the carboxy portion of this 
region. The BAV3 E1A protein also shows identity of 
sequence with HAd5 in the carboxy 6 amino acids of the 
promoter binding domain. These features may allow the 
BAV3 E1A protein to interact with the same 

30 transcription activating factors required for HAd5 E1A 
function. In contrast, except for a Glu-Glu pair 
there is little homology between the bovine and human 
viruses in the activation domain. The fact that this 
domain can be functionally substituted by a 

35 heterologous acidic activation sequence (Lillie et al, 
1989 supra) suggests that protein specificity is not 
required in this region and this may allow the BAV3 
E1A protein to function in the activation of BAV3 
genes. The BAV3 E1A activation region contains six 
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acidic residues in the 18 residues amino to the metal 
binding domain. 

The other interesting feature of BAV3 E1A, 
which is undoubtedly relevant to the oncogenic 
5 potential of this virus, is the presence of the 

sequence Asp27-Leu-Glu-Cys-His-Glu which conforms to, 
a core sequence known to be important in the binding 
of cellular Rb and related proteins by the 
transforming proteins of a number of DNA tumour 

10 viruses (Dyson et al, 1990 supra). From deletion 

mutant analysis there is a clear association between 
the potential of HAd5 E1A proteins to bind Rb and the 
ability of the protein to induce morphological 
transformation in appropriate cells (see references in 

15 Dyson et al, 1990 supra). The BAV3 E1A protein is 
distinct from its HAd5 counterpart in the relative 
position of this Rb binding sequence which is in the 
CR2 of HAd5 E1A and near the N-terminus of the BAV3 
E1A protein. 

20 Through the use of alternative splice sites 

HAd5 E1A transcripts can give rise to at least 5 
distinct mRNA species (Berk et al, 1978 Cell 14:695- 
711; Stephens et al, 1987 EMBO Journal j5: 2027-2035) . 
Whether BAV3, like HAd5, can generate a number of 

25 different mRNA species through the use of alternative 
splice sites in the E1A transcripts remains to be 
determined. For example a potential splice donor site 
which could delete the sequence equivalent to the 
unique sequence of HAd5 is present immediately after 

30 nt 1080 but it is not known if this site is actually 
used. 

HAd5 E1B encodes two proteins (19k and 56k) 
either of which can cooperate with E1A, by pathways 
which are additive and therefore presumably 
35 independent (McLorie et al, 1991 J. Gen. Virol. 

72:1467-1471), to produce morphological transformation 
of cells in culture (see for example: Branton et al, 
1985 supra; Graham, 1984 supra) . The significance of 
the conservation of the hydrophobic stretch of amino 
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acids in the central portion of the shorter E1B 
proteins of HAd5 and BAV3 is not clear as yet. A 
second short region of homology Gln-Ser-Ser-X-Ser-Thr- 
Ser at residue 136 near the C-terminus of the BAV3 
5 protein is located near the N-terminus at residue 20 
in the HAd5 19k protein. The major difference in both 
length and sequence of the larger (420R) E1B protein 
of BAV3 from the corresponding HAd5 protein (496R) is 
confined to the N-terminus of these proteins. The two 

10 proteins show considerable evolutionary homology in 

the 345 amino acids that extend to their C-termini. A 
similar degree of homology extends into the N-terminal 
halves of protein IX of BAV3 and HAd5. Taken together 
these analyses suggest that while BAV3 and the human 

15 adenoviruses have diverged by simple point mutational 
events in some regions, more dramatic genetic events 
such as deletion and recombination may have been 
operating in other regions particularly those defining 
the junction between E1A and E1B. 

20 

Example 3 Cloning and sequencing of t he BAV3 E3 and 

fibre genes 

The general organization of adenovirus 
genomes seems to be relatively well conserved so it 

25 was possible to predict, from the locations of a 
number of HAd E3 regions, that BAV E3 should lie 
between map units (m.u.) 77 to 86. To prepare DNA for 
cloning and sequencing, BAV3 (strain WBR-1) was grown 
in Madin-Darby bovine kidney (MDBK) cells, virions 

30 were purified and DNA was extracted (Graham, F.L. & 
Prevec, L. (1991) Methods in Molecular Biology, vol. 
7, Gene Transfer and Expression Protocols, pp. 109- 
146. Edited by E.J. Murray, Clifton, New Jersey; 
Humana Press.). Previously published restriction maps 

35 for EcdRI and BamHI (Kurokawa et al., 1978) were 

confirmed (Fig. 6) . The BamHI D and EcoRI F fragments 
of BAV3 DNA were isolated and inserted into pUC18 and 
pUC19 vectors, and nested sets of deletions were made 
using exonuclease III and SI nuclease (Henikoff, S. 
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(1984) Gene , 28:351-359). The resulting clones were 
sequenced by the dideoxynucleotide chain termination 
technique (Sanger, F., Nicklen, S. & Coulson, A.R. 
(1977) Proceedings of the National Academy of 
5 Sciences . U.S.A. , 74.: 5463-5467) . The nucleotide 

sequence from positions 1 to 287 was obtained from the 
right end of the BaroHI B fragment (Fig. 6) . The 
sequence of the regions spanning (i) the BamHI site at 
nucleotide 3306 and the .EcoRI site at nucleotide 3406 , 

10 and (ii) the EcoRl site at nucleotide 4801 and the 

nucleotide 5100 was obtained from a plasmid containing 
the XJbal C fragment (m.u. 83 to 100; not shown) using 
primers hybriding to BAV3 sequences. Analysis of the 
sequence was performed with the aid of the PC/GENE 

15 sequence analysis package developed by Amos Bairoch, 
Department of Medical Biochemistry, University of 
Geneva , Switzerland . 

The 5100 nucleotide sequence which extends 
between 77 and 92 m.u. of the BAV3 genome is shown in 

20 Fig. 7. The upper strand contains 14 open reading 
frames (ORFs) which could encode polypeptides of 60 
amino acid residues or more (Fig. 6 and 7). The lower 
strand contains no ORF encoding a protein of longer 
than 50 amino acids after an initiation codon. The 

25 predicted amino acid sequence for each ORF on the 

upper strand was analyzed for homology with predicted 
amino acid sequences from several sequenced Ads: HAd2 
(Herisse, J., Courtois, G. & Galibert, F. (1980) 
Nucleic Acids Research , 8:2173-2192; Herisse, J., 

30 Courtois, G. & Galibert, F. (1981) Nucleic Acids 

Research . 9:1229-1249), -3(Signas, C. , Akusjarvi, G. & 
Pettersson, U. (1985) Journal of Virology , 53:672- 
678.), -5(Cladaras, C. & Wold, W.S.M. (1985) Virology . 
140:28-43), -7 (Hong, J.S., Mullis, K.G. & Engler, 

35 J. A. (1988) Virology , 167:545-553) and -35 (Flomenberg, 
P.R. , Chen, M. & Horwitz, M.S. (1988) Journal of 
Virology . 62:4431-4437), and murine Adl (MAdl) 
(Raviprakash, K.S., Grunhaus, A., El Kholy, M.A. & 
Horwitz, M.S. (1989) Journal of Virology . 63:5455- 
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5458) and canine Adl (CAdl) (Dragulev, B.P., Sira, S., 
Abouhaidar, M.G. & Campbell, J.B. (1991) Virology . 
183:298-305). Three of the BAV3 ORFs exhibited 
homology with characterized HAd proteins pVIII, fibre 
5 and the 14.7K E3 protein. The amino acid sequence 
predicted from BAV3 ORF 1 shows overall identity of 
approximately 55% when compared to the C-terminal 75% 
of HAd2 pVIII (Cladaras & Wold, 1985, supra) (Fig. 
8a) , indicating that ORF 1 encodes the right end of 

10 BAd3 pVIII. Near the C-terminal end of BAd3 pVIII 

there is a 67 amino acid stretch (residues 59 to 125; 
Fig. 8a) which has 75% identity with HAd2 pVIII. This 
region has previously been shown to be highly 
conserved among different Ads (Cladaras & Wold, 1985, 

15 supra; Signas, C, Akusjarvi, G. & Pettersson, U. 

(1986) Gene . 50:173-184,; Raviprakash et al., 1989, 
supra; Dragulev et al. , 1991, supra). 

The fibre protein is present on the surface 
of the virion as long projections from each vertex of 

20 the icosahedral capsid and is involved in a number of 
Ad functions including attachment of the virus to the 
cell surface during infection, assembly of virions and 
antigenicity (Philipson, L. (1983) Current Topics in 
Microbiology and Immunology , 109:1-52). On the basis 

25 of the primary structure of HAd 2 fibre protein, it has 
been proposed that the shaft region (between amino 
acid residues 40 and 400) is composed of a number of 
repeating structural motifs containing about 15 
hydrophobic residues organized in two short 0-sheets 

30 and two 0-bends (Green, N.M. , Wrigley, N.G., Russell, 
W.C., Martin, S.R. & McLachlan, A.D. (1983) EMBO 
Journal, 2.51357-1365). The amino acid sequences at 
the N terminus of the BAV3 ORF 6-encoded protein share 
about 60% identity with the HAd2 fibre protein tail, 

35 but there is little or no similarity in the knob 

region, and about 45% identity overall (Fig. 8c) . The 
BAd3 fibre gene would encode a protein of 976 residues 
if no splicing occurs, i.e. 394 amino acid residues 
longer than the HAd2 fibre protein. The number of 
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repeating motifs in the shaft region of the fibre 
protein from different Ads varies between 28 and 23 
(Signas et al., 1985, supra; Chroboczek, J. & Jacrot, 
B. (1987) Virology . 161:549-554; Hong et al., 1988, 
5 supra; Raviprakash et al. , 1989, supra; Dragulev et 
al., 1991, supra). The BAV3 fibre protein can be 
organized into 52 such repeats in this region (not 
shown) , which would account for most of the difference 
in size compared to those of HAd2, HAd3, HAD5, HAd7, 
10 CAdl and MAdl (Signas et al., 1985, supra; Herisse et 

al., 1980, supra; Herisse & Galibert, 1981, supra; Hong 
et al., 1988, supra; Raviprakash et al., 1989, supra; 
Dragulev et al., 1991, supra). 

HAd2 and HAd5 E3 lies between the pVIII and 
15 the fibre genes an encodes at least 10 polypeptides 

(Cladaras & Wold, 1985, supra) . The promoter for E3 of 
these two serotypes lies within the sequences encoding 
pVIII, about 320 bp 5' of the termination codon. No 
consensus TATA box is found in the corresponding 
20 region of the BAV3 sequences. A non-canonical 

polyadenylation signal (ATAAA) for E3 transcripts is 
located at position 1723, between the end of the 
putative E3 region and the beginning of ORF 6, 
encoding the fibre protein, and two consensus signals 
25 are located within ORF 6 at positions 2575 and 3565. 
The polyadenylation signal for the fibre protein is 
located at nucleotide 4877. Six ORFs were identified 
in the BAV3 genome between the pVIII and the fibre 
genes, but only four (ORFs 2, 3, 4 and 5) have the 
30 potential to encode polypeptides of at least 50 amino 
acids after an initiation codon (Fig. 7). The amino 
acid sequence predicted to be encoded by ORF 2 is 307 
residues long and contains eight potential N- 
glycosylation sites (Fig. 7) as well as a hydrophobic 
35 sequence which may be a potential transmembrane domain 
( PLLFAFVLCTGCAVLLTAFGPS ILSGT ) between residues 262 and 
289. This domain may be a part of the protein 
homologous to the HAd2 and HAd5 19K E3 glycoprotein 
(Cladaras & Wold, 1985, supra), and the proposed CAdl 
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22. 2K protein (Dragulev et al., 1991, supra), but ORF 
2 does not show appreciable homology with these 
proteins. The ORF 4 shows approximately 44% identity 
with the 14. 7K E3 protein of HAd5 (Fig. 6 and 8b), 
5 which has been shown to prevent lysis of virus- 
infected mouse cells by tumour necrosis factor 
(Gooding, L.R., Elmore, L.W. , Tollefson, A.E., Brody, 
H.A. & Wold, W.S.M. (1988) Cell . 53:341-346; Wold, 
W.S.M. & Gooding, L.R. (1989) Molecular Biology and 

10 Medicine . 6:433-452). Analysis of the 14. 7K protein 
sequence from HAd2, -3, -5 and -7 has revealed a 
highly conserved domain, which in HAd5 lies between 
amino acid residues 41 and 56 (Horton, T.M. , 
Tollefson, A.E., Wold, W.S.M. & Gooding, L.R. (1990) 

15 Journal of Virology , 64 2 1250-1255). The corresponding 
region in the BAV3 ORF 4-encoded protein, between 
amino acids 70 and 85, contains 11 amino acids 
identical to those of the HAd5 14. 7K protein conserved 
domain (Fig. 8b) . 

20 The BAV3 E3 region appears to be 

approximately 1.5kbp long, about half the size of 
those of HAd2 and -5 (Cladaras & Wold, 1985, supra) , 
and novel splicing events in BAV3 E3 would be reguired 
to generate more homologues to the HAd3 E3 proteins. 

25 A similarly short E3 region has been reported for MAdl 
(RAviprakash et al., 1989, supra) and CAdl (Dragulev 
et al., 1991, supra). 

Example 4 Construction of BAV3-lucif erase 

30 recombinants 

Adenovirus-based mammalian cell expression 
vectors have gained tremendous importance in the last 
few years as a vehicle for recombinant vaccine 
delivery, and also in gene therapy. BAV3-based 

35 expression vectors have a greater potential for 

developing novel recombinant vaccines for veterinary 
use. To show that BAV3 E3 gene products are not 
essential for virus growth in cultured cells and this 
locus could be used to insert foreign DNA sequences, a 
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1.7 kb fragment containing the firefly lucif erase gene 
was introduced in the 696 bp deletion of the E3 region 
of the BAV3 genome in the E3 parallel orientation to 
generate a BAV3 recombinant. 
5 The rationale of using the luciferase gene 

is that it acted as a highly sensitive reporter gene 
when introduced in the E3 region of the HAd5 genome to 
generate HAd5-Luc recombinants (Mittal et al (1993) 
Virus Res. 28:67-90) . 

10 To facilitate the insertion of the firefly 

luciferase gene into the E3 region of the BAV3 
genome, a BAV3 E3 transfer vector containing the 
luciferase gene was constructed (Fig. 9) . The BAV3 E3 
region falls approximately between m.u. 77 and 82. In 

15 our first series of vectors we replaced a 696 bp Xhol- 
Ncol E3 deletion (between m.u. 78.8 and 80.8) with a 
Nrul-Sall cloning sites for insertion of foreign genes 
to obtain pSM14del2. A 1716 bp Bsml-Sspl fragment 
containing the luciferase gene was isolated and first 

20 inserted into an intermediate plasmid, pSM41, in the 
E3 locus at the Sail site by blunt end ligation to 
generate pSM41-Luc. The luciferase gene without any 
exogenous regulatory sequences, was inserted into the 
E3 locus in the same orientation as the E3 

25 transcription unit. The kan T gene was inserted into 
pSM41— Luc at the Xbal site present within the 
luciferase gene to generate an amp r /kan r plasmid, 
pSM41-Luc-Kan. A 7.7 kb fragment containing the BAV3 
sequences along with the luciferase gene and the kan r 

30 gene was obtained from pSM41-Luc-Kan by digestion with 
BamHI and inserted into an amp r plasmid, pSM5l 
partially digested with BamHI to replace a 3.0 kb 
BamHI fragment (lies between m.u. 77.8 and 86.4) to 
generate a doubly resistant (kan r & amp r ) plasmid, 
35 pSM51-Luc-Kan. The kan r gene was deleted from pSM51- 
Luc-Kan by partial cleavage with Xbal to generate 
pSM51-Luc containing the luciferase gene in the E3- 
parallel orientation. 
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MDBK cells transformed with a plasmid 
containing the BAV3 El sequences was cotransf ected 
with the wt BAV3 DNA digested with Pvul, which make 
two cuts within the BAV3 genome at m.u 65.7 and 71. 1, 
5 and the plasmid, pSM51-Luc to rescue the luciferase 

gene in E3 of the BAV3 genome by in vivo recombination 
(Fig. 10) . The digestion of the wt BAV3 DNA with Pvul 
was helpful in minimizing the generation of the wt 
virus plaques following cotransf ection. The left end 

10 of the wt BAV3 genome represented by Pvul 'A' fragment 
falls between m.u. 0 and 65.7, and pSM51-Luc which 
extends between m.u. 31.5 and 100 (except for E3 
deletion replaced with the luciferase gene) have 
sufficient overlapping BAV3 DNA sequences to generate 

15 recombinant viruses. 

Two virus plaques were obtained in two 
independent cotransf ection experiments which were 
grown in MDBK cells. The viral DNA from both plaques 
was extracted and analyzed by agarose gel 

20 electrophoresis after digesting either with BamHI, 

EcoRI or Xbal to identify the presence and orientation 
of the luciferase gene in the viral genome (data not 
shown). In the genomes of both recombinants, the 
luciferase gene was present in the E3 region in the E3 

25 parallel orientation. The BAV3 -luciferase 

recombinants were plaque purified and named BAV3-Luc 
(3.1) and BAV3 -Luc (3.2) to represent plaques obtained 
from two independent experiments. Since both 
recombinant virus isolates were identical they will be 

30 referred to as BAV3-Luc. The presence of the 

luciferase gene in BAV3-Luc isolates are further 
confirmed by Southern blot analyses and luciferase 
assays using extracts from recombinant virus-infected 
cells. 

35 

Characterization of BAV3 -recombinants 

Southern blot analyses of the wt BAV3 and 
recombinants genomic DNA digested either with BamHI, 
EcoRI or Xbal, were carried out to confirm the 
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presence and orientation of the luciferase gene in the 
E3 locus and the deletion of the 696 bp Xhol-Ncol 
fragment from E3 of the BAV3-Luc genome (Fig. 11) . 
When the blot was probed with a 696 Xhol-Ncol fragment 
5 of E3 of the BAV3 genome (panel A, lanes 4 to 9) no 
hybridization signal was detected with the DNA 
fragments from the recombinant viruses, however, the 
expected bands (3.0 kb BamHI, 8.1 kb EcoRI, and 18.5 
kb Xbal) of the wt BAV3 DNA fragments (panel A, lanes 

10 10 to 12) showed hybridization, confirming that the 

696 bp Xhol-Ncol fragment of the E3 region was indeed 
deleted in the BAV3-Luc genomic DNA. In panel B, when 
an identical blot was probed with the luciferase gene, 
there were strong hybridization signals with the DNA 

15 fragments from the recombinant viruses (4.0 kb BamHI 
(lane 4 & 7) , 6.0 kb & 3.2 kb EcoRI (lanes 5 & 8) , 
16.7 kb & 2.9 kb Xbal (lanes 6 & 9)). These results 
confirmed that the BAV3-Luc contains the luciferase 
gene in the E3 parallel orientation with a 696 bp 

20 Xhol-Ncol E3 deletion. 

The growth characteristics of the 
recombinant viruses was compared with the wt BAV3 in a 
single step growth curve (Fig. 12) . Virus titers in 
MDBK cells-infected with the wt BAV3 started 

25 increasing at 12 h post-infection reaching a maximum 
at 36-48 h post-infection and then declined 
thereafter. Virus titers of the recombinant viruses 
also started increasing at 12 h postinfection reaching 
a maximum at 48 h post-infection and then declined, 

30 however, the titers of recombinant viruses remained 
approximately one log lower than the wt virus. The 
plaque size of the recombinant viruses were also 
comparatively smaller than the wt virus (data not 
shown) . 

35 

Kinetics of luciferase expression by BAV3-Luc 

Luciferase activity in BAV3 -Luc-infected 
MDBK cells was monitored at different times post- 
infection by luciferase assays (Fig. 13). A low level 
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of lucif erase activity was first observed at 12 h 
post-infection reaching a peak at 30 h post- infection 
and then dropped subsequently. At 30 h post- 
infection, approximately 425 pg luciferase was 
5 detected in 4xl0 5 BAV3-Luc (3 . 1) -infected MDBK cells. 
In MDBK cells-infected with the wt BAV3, luciferase 
expression was not detected (data not shown) . The 
kinetics of luciferase expression by BAV3-Luc (3.1) 
and BAV3-Luc (3.2) appears very much similar. The 
10 kinetics of luciferase expression also showed that the 
majority of en2yme expression in virus-infected cells 
seemed to occur late in infection. To determine 
luciferase expression in the absence of viral DNA 
replication, BAV3 -Luc-infected MDBK cells were 
15 incubated in the presence of an inhibitor of DNA 

synthesis, 1-0-D-arabinofuranosyl cytosine (AraC) and 
luciferase activity was measured in virus-infected 
cell extracts at various times post-infection and 
compared to luciferase expression obtained in the 
20 absence of AraC (Fig. 14). When the recombinant 

virus -infected cells were incubated in the presence of 
AraC, luciferase expression at 18, 24 and 30 h post- 
infection was approximately 20-30% of the value 
obtained in the absence of AraC. These results 
25 indicated that the majority of luciferase expression 
in MDBK cells infected with BAV3-Luc took place after 
the onset of viral DNA synthesis. To confirm this 
MDBK cells-infected with the BAV3-Luc were grown in 
the absence or presence of AraC, harvested at 18 h, 24 
30 h, and 30 h post-infection, viral DNA extracted and 

analyzed by dot bot analysis using pSM51-Luc (see Fig. 
9) as a probe (data not shown) . in the presence of 
AraC, viral DNA synthesis was severely reduced 
compared to viral DNA synthesis in the absence of 
35 AraC. 

Western blot analysis of BAV3 -Luc- infected cells 

Luciferase was expressed as an active enzyme 
as determined by luciferase assays using extracts from 
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MDBK cells-infected with BAV3-Luc (see Fig, 13). The 
luciferase gene without any exogenous regulatory 
sequences was inserted into E3 of the BAV3 genome , 
therefore, there was a possibility of luciferase 
5 expression as a fusion protein with part of an E3 

protein if the luciferase gene was in the same frame, 
Such as, Fl and F3 which represent open reading frames 
(ORFs) for E3 proteins (Fig. 15) or the fusion protein 
may arise due to recognition of an upstream initiation 

10 codon in the luciferase ORF. To explore this 

possibility we sequenced the DNA at the junction of 
the luciferase gene and the BAV3 sequences with the 
help of a plasmid, pSM51-Luc and a synthetic primer 
design to bind luciferase coding sequences near the 

15 initiation codon (data not shown) . The luciferase 
coding region fell in frame F2. The luciferase 
initiation codon was the first start codon in this 
frame, however, the ORF started at 84 nucleotides 
upstream of the luciferase start codon. To further 

20 confirm that luciferase protein is of the same 
molecular weight as purified firefly luciferase, 
unlabeled mock-infected, wt BAV3 -infected or BAV3 -Luc- 
infected MDBK cell extracts were reacted with an anti- 
lucif erase antibody in a Western blot (Fig. 16) . A 62 

25 kDa polypeptide band was visible in the BAV3-Luc (lane 
3 and 4) -infected cell extracts which were of the same 
molecular weight as pure firefly luciferase (lane 5). 
We are not sure whether a band of approximately 30 kDa 
which also reacted with the anti-luciferase antibody 

30 in lanes 3 and 4 represented a degraded luciferase 
protein. 

The majority of luciferase expression is 
probably driven from the major late promoter (MLP) to 
provide expression paralleling viral late gene 
35 expression, moreover, the enzyme expression seen in 
the presence of AraC may be taking place from the E3 
promoter. In HAd5 vectors, foreign genes without any 
exogenous regulatory sequences when inserted in E3 
also displayed late kinetics and were inhibited by 
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AraC. The BAV3 recombinant virus replicated 
relatively well in cultured cells but not as good as 
the wt BAV3 . This is not surprising as infectious 
virus titers of a number of HAd5 recombinants were 
5 slightly lower than the wt HAd5 (Bett et al (1993) J± 
Virol. 62:5911-5921). This may be because of reduced 
expression of fiber protein in recombinant 
adenoviruses having inserts in the E3 region compared 
to the wt virus (Bett et al, supra and Mittal et al 

10 (1993) Virus Res. 28:67-90). 

The E3 of BAV3 is approximately half the 
size of the E3 region of HAd2 or HAd5 and thus has the 
coding potential for only half the number of proteins 
compared to E3 of HAd2 or HAd5 (Cladaras et al (1985) 

15 Virology 140:28-43: Herisse et al (1980) Nuc. Acids 

Res. 8:2173-2192; Herisse et al (1981) Nuc. Acids Res. 
9:1229-1249 and Mittal et al (1993 J. Gen. Virol. 
73:3295-3000). BAV3 E3 gene products have been shown 
to be not required for virus growth in tissue culture. 

20 However, presently it is known that BAV3 E3 gene 

products also evade immune surveillance in vivo like 
HAds E3 proteins. One of the BAV3 E3 open reading 
frames (ORFs) has been shown to have amino acid 
homology with the 14.7 kDa E3 protein of HAds (Mittal 

25 et al (1993) supra). The 14.7 kDa E3 protein of HAds 
prevents lysis of virus-infected mouse cells by tumour 
necrosis factor (Gooding et al (1988) Cell 53:341-346 
and Horton et al (1990) J. Virol. 64:1250-1255). The 
study of pathogenesis and immune responses of a series 

30 of BAV3 E3 deletion mutants in cattle provides very 
useful information regarding the role of E3 gene 
products in modulating immune responses in their 
natural host. v 

The BAV3 -based vector has a 0.7 kb E3 

35 deletion which can hold an insert up to 2.5 kb in 

size. The BAV3 E3 deletion can extend probably up to 
1.4 kb which in turn would also increase the insertion 
capacity of this system. The role of the MLP and the 
E3 promoter is examined to determine their ability to 
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drive expression of a foreign gene inserted into E3 
when a proper polyadenylation signal is provided. 
Exogenous promoters, such as, the simian virus 40 
(SV40) promoter (Subramant et al (1983) Anal. Biochem. 
5 135:1-15), the human cytomegalovirus immediate early 
promoter (Boshart et al (1985) Cell 43.: 215-222) , and 
the human beta-actin promoter (Gunning et al (1987) 
PNAS. USA 84:4831-4835) are tested to evaluate their 
ability to facilitate expression of foreign genes when 

10 introduced into E3 of the BAV3 genome. 

Recently HAd-based expression vectors are 
under close scrutiny for their potential use in human 
gene therapy (Ragot et al (1993) Nature 361 :647-650; 
Rosenfeld et al (1991) Science 252 :431-434; Rosenfeld 

15 et al (1992) Cell 68:141-155 and Stratf ord-Perricaudet 
et al (1990) Hum. Gene. Ther. 1:241-256), A 
preferable adenovirus vector for gene therapy would be 
one which maintains expression of the required gene 
for indefinite or for a long period in the target 

20 organ or tissue. It may be obtained if the 

recombinant virus vector genome is incorporate into 
the host genome or maintained its independent 
existence extrachromosomally without active virus 
replication. HAds replicate very well in human, being 

25 their natural host. HAds can be made defective in 
replication by deleting the El region, however, how 
such vectors would maintain the expression of the 
target gene in a required fashion is not very clear. 
Moreover, the presence of anti-HAds antibodies in 

30 almost every human being may create some problems with 
the HAd-based delivery system. The adenovirus genomes 
have a tendency to form circles in non-permissive 
cells. BAV-based vectors could provide a possible 
alternative to HAd-based vectors for human gene 

35 therapy. As BAV3 does not replicate in human, the 
recombinant BAV3 genomes may be maintained as 
independent circles in human cells providing 
expression of the essential protein for a long period 
of time. 
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The foreign gene insertion in animal 
adenoviruses is much more difficult than HAds because 
it is hard to develop a cell line which is also good 
for adenovirus DNA-mediated transfection. This may be 
5 one of the major reasons that the development of an 

animal adenovirus-based expression system has not been 
reported so far. It took us more than a year to 
isolate a cell line suitable for BAV3 DNA-mediated 
transf ection- However, the rapid implementation of 

10 BAV-based expression vectors for the production of 

live virus recombinant vaccines for farm animals, is 
very promising. BAVs grow in the respiratory and 
gastrointestinal tracts of cattle, therefore, 
recombinant BAV-based vaccines have use to provide a 

15 protective mucosal immune response, in addition to 
humoral and cellular immune responses, against 
pathogens where mucosal immunity plays a major role in 
protection. 

20 Example 5 Generation of cell lines transformed with 

the BAV3 El sequences 

MDBK cells in monolayer cultures were 
transfected with pSM71-neo, pSM61-kanl or pSM61-kan2 
by a lipofection-mediated transfection technique 

25 (GIBCO/BRL, Life Technologies, Inc., Grand Island, 
NY) . At 48 h after transfection, cells were 
maintained in the MEM supplemented with 5% fetal 
bovine serum and 700 ng/ml G418. The medium was 
changed every 3rd day. In the presence of G418, only 

3 0 those cells would grow which have stably incorporated 
the plasmid DNA used in transfection experiments into 
their genomes and are expressing the neo r gene. The 
cells which have incorporated the neo r gene might also 
have taken up the BAV3 El sequences and thus 

35 expressing BAV3 El protein/s. A number of neo r (i.e., 
G4 18 -resistant) colonies were isolated, expended and 
tested for the presence of BAV3 El message/s by 
Northern blot analyses using a DNA probe containing 
only the BAV3 El sequences. Expression of BAV3 El 
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protein/s were confirmed by a complimentation assay 
using a HAd5 deletion mutant defective in El function 
due to an El deletion. 

Fetal bovine kidney cells in monolayers were 
5 also transfected with pSM71-neo, pSM61kan-l or pSM61- 
kan2 by the lipofection-mediated transfection 
technique, electroporation (Chu et al (1987) Nucl. 
Acids Res, 15 : 1311-1326) , or calcium phosphate 
precipitation technique (Graham et al (1973) virology 
10 52:456-467). Similarly, a number of G418-resistant 
colonies were isolated, expended and tested for the 
presence of BAV3 El gene products as mentioned above. 

Example 6 Generation of a BAV3 recombinant containing 

15 the beta-galactosidase gene as an El insert 

As El gene products are essential for virus 
replication, adenovirus recombinants containing El 
inserts will grow only in a cell line which is 
transformed with the adenovirus El sequences and 

20 expresses El. A number of cell line which are 

transformed with the BAV3 El sequences were isolated 
as described earlier. The technique of foreign gene 
insertions into the El regions is similar to the gene 
insertion into the E3 region of the BAV3 genome, 

25 however, for insertion into El there is a need of an 
El transfer plasmid which contains DNA sequences from 
the left end of the BAV3 genome, an appropriate 
deletion and a cloning site for the insertion of 
foreign DNA sequences. G418-resistant MDBK cell 

30 monolayers were cotransf ected with the wild-type (wt) 
BAV3 DNA and pSM71-Z following the lipofection- 
mediated transfection procedure (G1BCO/BRL, Life 
Technologies, Inc., Grand Island, NY). The monolayers 
were incubated at 37 °C under an agarose overlay. 

35 After a week post- incubation an another layer of 
overlay containing 300 ug/ml Blu-gal™ (GIBCO/BRL 
Canada, Burlington, Ontario, Canada) was put onto each 
monolayer. The blue plaques were isolated, plaque 
purified and the presence of the beta-galactosidase 
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gene in the BAV3 genome was identified by agarose gel 
electrophoresis of recombinant virus DNA digested with 
suitable restriction enzymes and confirmed by beta- 
galactosidase assays using extracts from recombinant . 
5 virus infected cells. 

Deposit of Biological Materials 
The following materials were deposited and 
are maintained with the Veterinary Infectious Disease 
10 Organization (VIDO) , Saskatoon, Saskatchewan, Canada. 

The nucleotide sequences of the deposited 
materials are incorporated by reference herein, as 
well as the sequences of the polypeptides encoded 
thereby. In the event of any discrepancy between a 
15 sequence expressly disclosed herein and a deposited . 
sequence, the deposited sequence is controlling. 

Material Internal Accession Wo . Deposit 

Date 

Recombinant plasmids 



20 



25 



PSM51 pSM51 Dec 6, 1993 

pSM71 pSM71 Dec 6, 1993 

Recombinant cell lines 

MDBK cells transformed with BAV3 El sequences (MDBK-BAVE1) 

Dec 6, 1993 

Fetal bovine kidney cells transformed with BAV3 El sequences (FBK- 

BAV-E1) Dec 6, 1993 



While the present invention has been 
illustrated above by certain specific embodiments, the 
specific examples are not intended to limit the scope 
of the invention as described in the appended claims. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

(i) APPLICANT: UNIVERSITY OF SASKATCHEWAN 

(ii) TITLE OF INVENTION: RECOMBINANT PROTEIN PRODUCTION IN BOVINE 

ADENOVIRUS EXPRESSION VECTOR SYSTEM 
(Hi) NUMBER OF SEQUENCES: 34 
(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: SCOTT & AYLEN 

(B) STREET: 60 QUEEN STREET 

(C) CITY: OTTAWA 

(D) PROVINCE: ONTARIO 

(E) COUNTRY: CANADA 

(F) POSTAL CODE: K1P 5Y7 

|v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patentln Release #1 .0, Version #1 .25 

<vi> CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C> CLASSIFICATION: 

(viil) ATTORNEY/AGENT INFORMATION: 

(A) NAME: JOAN M. VAN ZANT 

(B) REFERENCE/DOCKET NUMBER: PAT 21976TW-90 
(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 1-416-368-2400 

(B) TELEFAX: 1-416-363-7246 

(2) INFORMATION FOR SEQ ID NO:1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4060 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: join(606..1215. 1323..1345) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:1: 

CATCATCAAT AATCTACAGT ACACTGATGG CAGCGGTCCA ACTGCCAATC ATTTTTGCCA 60 
CGTCATTTAT GACGCAACGA CGGCGAGCGT GGCGTGCTGA CGTAACTGTG GGGCGGAGCG 1 20 
CGTCGCGGAG GCGGCGGCGC TGGGCGGGGC TGAGGGCGGC GGGGGCGGCG CGCGGGGCGG 180 
CGCGCGGGGC GGGGCGAGGG GCGGAGTTCC GCACCCGCTA CGTCATTTTC AGACAI 1 1 1 I 240 
TAGCAAATTT GCGCCTTTTG CAAGCATTTT TCTCACATTT CAGGTATTTA GAGGGCGGAT 300 
TTTTGGTGTT CGTACTTCCG TGTCACATAG TTCACTGTCA ATCTTCATTA CGGCTTAGAC 36.0 



AAATTTTCGG CGTCTTTTCC GGGTTTATGT CCCCGGTCAC CTTTATGACT GTGTGAAACA 420 
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CACCTGCCCA TTGTTTACCC TTGGTCAGTT TTTTCGTCTC CTAGGGTGGG AACATCAAGA 480 

ACAAATTTGC CGAGTAATTG TGCACCTTTT TCCGCGTTAG GACTGCGTTT CACACGTAGA - 540 

CAGACTTTTT CTCATTTTCT CACACTCCGT CGTCCGCTTC AGAGCTCTGC GTCTTCGCTG 600 

CCACC ATG AAG TAC CTG GTC CTC GTT CTC AAC GAC GGC ATG AGT CGA 647 
Met Lys Tyr Leu Vat Leu Val Leu Asn Asp Gly Met Sep Arg 
5 15 10 

ATT GAA AAA GCT CTC CTG TGC AGC GAT GGT GAG GTG GAT TTA GAG TGT 695 
lie Glu Lys Ala Leu Leu Cys Ser Asp Gly Glu Val Asp Leu Glu Cys 
15 20 25 30 

CAT GAG GTA CTT CCC CCT TCT CCC GCG CCT GTC CCC GCT TCT GTG TCA 743 
His Glu Val Leu Pro Pro Ser Pro Ala Pro Val Pro Ala Ser Val Ser 
35 40 45 

10 CCC GTG AGG AGT CCT CCT CCT CTG TCT CCG GTG TTT CCT CCG TCT CCG 791 
Pro Val Arg Ser Pro Pro Pro Leu Ser Pro Val Phe Pro Pro Ser Pro 
50 55 60 

CCA GCC CCG CTT GTG AAT CCA GAG GCG AGT TCG CTG CTG CAG CAG TAT 839 
Pro Ala Pro Leu Vat Asn Pro Glu Ala Ser Ser Leu Leu Gin Gin Tyr 
65 70 75 

CGG AGA GAG CTG TTA GAG AGG AGC CTG CTC CGA ACG GCC GAA GGT CAG 887 
Arg Arg Glu Leu Leu Glu Arg Ser Leu Leu Arg Thr Ala Glu Gly Gin 
15 80 85 90 

CAG CGT GCA GTG TGT CCA TGT GAG CGG TTG CCC GTG GAA GAG GAT GAG 935 
Gin Arg Ala Val Cys Pro Cys Glu Arg Leu Pro Val Glu Glu Asp Glu 
95 100 105 110 

TGT CTG AAT GCC GTA AAT TTG CTG TTT CCT GAT CCC TGG CTA AAT GCA 983 
Cys Leu Asn Ala Val Asn Leu Leu Phe Pro Asp Pro Trp Leu Asn Ala 
115 120 125 

20 GCT GAA AAT GGG GGT GAT ATT TTT AAG TCT CCG GCT ATG TCT CCA GAA 1031 
Ala Glu Asn Gly Gly Asp lie Phe Lys Ser Pro Ala Met Ser Pro Glu 
130 135 140 

CCG TGG ATA GAT TTG TCT AGC TAC GAT AGC GAT GTA GAA GAG GTG ACT 1079 
Pro Trp lie Asp Leu Ser Ser Tyr Asp Ser Asp Vat Glu Glu Val Thr 
145 150 . 155 

AGT CAC TTT TTT CTG GAT TGC CCT GAA GAC CCC AGT CGG GAG TGT TCA 1127 
Ser His Phe Phe Leu Asp Cys Pro Glu Asp Pro Ser Arg Glu Cys Ser 
25 160 165 170 

TCT TGT GGG TTT CAT CAG GCT CAA AGC GGA ATT CCA GGC ATT ATG TGC 1175 
Ser Cys Gly Phe His Gin Ala Gin Ser Gly lie Pro Gly lie Met Cys 
175 180 185 190 

AGT TTG TGC TAC ATG CGC CAA ACC TAC CAT TGC ATC TAT A GTAAGTACAT 1225 
Ser Leu Cys Tyr Met Arg Gin Thr tyr His Cys lie Tyr 
195 200 

30 TCTGTAAAAG AACATCTTGG TGATTTCTAG GTATTGTTTA GGGATTAACT GGGTGGAGTG 1285 

ATCTTAATCC GGCATAACCA AATACATGTT TTCACAG GT CCA GTT TCT GAA GAG 1339 

Ser Pro Val Ser Glu Glu 
205 

GAA ATG TGAGTCATGT TGACTTTGGC GCGCAAGAGG AAATGTGAGT CATGTTGACT 1395 

Glu Met 

210 

35 TTGGCGCGCC CTAC6GTGAC TTTAAAGCAA TTTGAGGATC ACTTTTTTGT TAGTCGCTAT 1455 
AAAGTAGTCA CGGAGTCTTC ATGGATCACT TAAGCGTTCT TTTGGATTTG AAGCTGCTTC 1515 
GCTCTATCGT AGCGGGGGCT TCAAATCGCA CTGGAGTGTG 6AAGAGGCGG CTGTGGCTGG 1575 
GACGCCTGAC TCAACTGGTC CATGATACCT GCGTAGAGAA CGAGAGCATA TTTCTCAATT 1635 
CTCTGCCAGG GAATGAAGCT TTTTTAAGGT TGCTTCGGAG CGGCTATTTT GAAGTGTTTG 1695 
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ACGTGTTTGT GGTGCCTGAG CTGCATCTGG ACACTCCGGG TCGAGTGGTC GCCGCTCTTG 1755 
CTCTGCTGGT GTTCATCCTC AACGATTTAG ACGCTAATTC TGCTTCTTCA GGCTTTGATT 1815 
CAGGTTTTCT CGTGGACCGT CTCTGCGTGC CGCTATGGCT GAAGGCCAGG GCGTTCAAGA 1875 
TCACCCAGAG CTCCAGGAGC ACTTCGCAGC CTTCCTCGTC GCCCGACAAG ACGACCCAGA 1935 
5 CTACCAGCCA GTAGACGGGG ACAGCCCACC CCGGGCTAGC CTGGAGGAGG CTGAACAGAG 1995 
CAGCACTCGT TTCGAGCACA TCAGTTACCG AGACGTGGTG GATGACTTCA ATAGATGCCA 2055 
TGATGTTTTT TATGAGAGGT ACAGTTTTGA GGACATAAAG AGCTACGAGG CTTTGCCTGA 2115 
GGACAATTTG GAGCAGCTCA TAGCTATGCA TGCTAAAATC AAGCTGCTGC CCGGTCGGGA 2175 

GTATGAGTTG ACTCAACCTT TGAACATAAC ATCTTGCGCC TATGTGCTCG GAAATGGGGC 2235 

10 TACTATTAGG GTAACAGGGG AAGCCTCCCC GGCTATTAGA GTGGGGGCCA TGGCCGTGGG 2295 < 

TCCGTGTGTA ACAGGAATGA CTGGGGTGAC TTTTGTGAAT TGTAGGTTTG AGAGAGAGTC 2355 

AACAATTAGG GGGTCCCTGA TACGAGCTTC AACTCACGTG CTGTTTCATG GCTGTTATTT 2415 

TATGGGAATT ATGGGCACTT GTATTGAGGT GGGGGCGGGA GCTTACATTC GGGGTTGTGA 2475 

GTTTGTGGGC TGTTACCGGG GAATCTGTTC TACTTCTAAC AGAGATATTA AGGTGAGGCA 2535 

15 GTGCAACTTT GACAAATGCT TACTGGGTAT TACTTGTAAG GGGGACTATC GTCTTTCGGG 2595 

AAATGTGTGT TCTGAGACTT TCTGCTTTGC TCATTTAGAG GGAGAG6GTT TGGTTAAAAA 2655 

CAACACAGTC AAGTCCCCTA GTCGCTGGAC CAGCGAGTCT GGCTTTTCCA TGATAACTTG 2715 

TGCAGACGGC AGGGTTACGC CTTTGGGTTC CCTCCACATT GTGGGCAACC GTTGTAGGCG 2775 

TTGGCCAACC ATGCAGGGGA ATGTGTTTAT CATGTCTAAA CTGTATCTGG GCAACAGAAT 2835 

20 AGGGACTGTA GCCCTGCCCC AGTGTGCTTT CTACAAGTCC AGCATTTGTT TGGAGGAGAG 2895 

GGCGACAAAC AAGCTGGTCT TGGCTTGTGC TTTTGAGAAT AATGTACTGG TGTACAAAGT 2955 

GCTGAGACGG GAGAGTCCCT CAACCGTGAA AATGTGTGTT TGTGGGACTT CTCATTATGC 3015 

AAAGCCTTTG ACACTGGCAA TTATTTCTTC AGATATTCGG GCTAATCGAT ACATGTACAC 3075 

TGTGGACTCA ACAGAGTTCA CTTCTGACGA GGATTAAAAG TGGGCGGGGC CAAGAGGGGT 3135 

2 5 ATAAATAGGT GGGGAGGTTG AGGGGAGCCG TAGTTTCTGT TTTTCCCAGA CTGGGGGGGA 3195 

CAACATGGCC GAGGAAGGGC GCATTTATGT GCCTTATGTA ACTGCCCGCC TGCCCAAGTG 3255 

GTCGGGTTCG GTGCAGGATA AGACGGGCTC GAACATGTTG GGGGGTGTGG TACTCCCTCC 3315 

TAATTCACAG GCGCACCGGA CGGAGACCGT GGGCACTGAG GCCACCAGAG ACAACCTGCA 3375 

CGCCGAGGGA GCGCGTCGTC CTGAGGATCA GACGCCCTAC ATGATCTTGG TGGAGGACTC 3435 

30 TCTGGGAGGT TTGAAGAGGC GAATGGACTT GCTGGAAGAA TCTAATCAGC A6CTCCTGGC 3495 

AACTCTCAAC CGTCTCCGTA CAGGACTCGC TGCCTATGTG CAGGCTAACC TTGTGGGCGG 3555 

CCAAGTTAAC CCCTTTGTTT AAATAAAAAT ACACTCATAC AGTTTATTAT GCTGTCAATA 3615 

AAATTCTTTA TTTTTCCTGT GATAATACCG TGTCCAGCGT GCTCTGTCAA TAAGGGTCCT 3675 

ATGCATCCTG AGAAGGGCCT CATATACCCA TGGCATGAAT ATTAAGATAC ATGGGCATAA 3735 

35 GGCCCTCAGA AGGGTTGAGG TAGAGCCACT GCAGACTTTC GTGGGGAGGT AAGGTGTTGT 3795 

AAATAATCCA GTCATACTGA CTGTGCTGGG CGTGGAAGGA AAAGATGTCT TTTAGAAGAA 3855 

GGGTGATTGG CAAAGGGAGG CTCTTAGTGT AGGTATTGAT AAATCTGTTC AGTTGGGAGG 3915 

GATGCATTCG GGGGCTAATA AGGTGGAGTT TAGCCTGAAT CTTAAGGTTG GCAATGTTGC 3975 

CCCCTAGGTC TTTGCGAGGA TTCATGTTGT GCAGTACCAC AAAAACAGAG TAGCCTGTGC 4035 
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ATTTGGGGAA TTTATCATGA AGCTT 4060 

(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 211 amino acids 

(B) TYPE: amino acid 
5 (D) TOPOLOGY: linear 

<M> MOLECULE TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Met Lys Tyr Leu Val Leu Val Leu Asn Asp Gly Met Ser Arg lie Glu 
15 10 15 

Lys Ala Leu Leu Cys Ser Asp Gly Glu Val Asp Leu Glu Cys His Glu 
10 20 25 30 

Val Leu Pro Pro Ser Pro Ala Pro Val Pro Ala Ser Val Ser Pro Val 
35 AO 45 

Arg Ser Pro Pro Pro Leu Ser Pro Val Phe Pro Pro Ser Pro Pro Ala 
50 55 60 



15 



Pro Leu Val Asn Pro Glu Ala Ser Ser Leu Leu Gin Gin Tyr Arg Arg 
65 70 75 80 

Glu Leu Leu Glu Arg Ser Leu Leu Arg Thr Ala Glu Gly Gin Gin Arg 
65 90 95 

Ala Val Cys Pro Cys Glu Arg Leu Pro Val Glu Glu Asp Glu Cys Leu 
100 105 110 

Asn Ala Val Asn Leu Leu Phe Pro Asp Pro Trp Leu Asn Ala Ala Glu 
115 120 125 

20 Asn Gly Gly Asp lie Phe Lys Ser Pro Ala Met Ser Pro Glu Pro Trp 
130 135 140 

He Asp Leu Ser Ser Tyr Asp Ser Asp Val Glu Glu Val Thr Ser His 
145 150 155 160 

Phe Phe Leu Asp Cys Pro Glu Asp Pro Ser Arg Glu Cys Ser Ser Cys 
165 170 1 75 

Gly Phe His Gin Ala Gin Ser Gly He Pro Gly lie Met Cys Ser Leu 
25 180 185 190 

Cys Tyr Met Arg Gin Thr Tyr His Cys He Tyr Ser Pro Val Ser Glu 
195 200 205 

Glu Glu Met 
210 

(2) INFORMATION FOR SEQ ID 110:3: 

30 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4060 base pairs 
(B> TYPE: nucleic acid 

(C) STRAND EDNESS: double 

(D) TOPOLOGY: linear 

(ii> MOLECULE TYPE: DNA (genomic) 



(ix> FEATURE: 
35 (A) NAME/KEY: COS 

(B) LOCATION: 1476,. 1946 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:3: 
CATCATCAAT AATCTACAGT ACACTGATGG CAGCGGTCCA ACTGCCAATC ATTTTTGCCA 
CGTCATTTAT GACGCAACGA CGGCGAGCGT GGCGTGCTGA CGTAACTGTG GGGCG6AGCG 



60 
120 
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CGTCGCGGAG GCGGCGGCGC TGGGCGGGGC TGAGGGCGGC GGGGGCGGCG CGCGGGGCGG 180 

CGCGCG6GGC GGGGCGAGGG GCGGAGTTCC GCACCCGCTA CGTCATTTTC AGACATTTTT 240 

TAGCAAATTT GCGCCTTTTG CAAGCATTTT TCTCACATTT CAGGTATTTA GAGGGCGGAT 300 

TTTTGGTGTT CGTACTTCCG TGTCACATAG TTCACTGTCA ATCTTCATTA CGGCTTAGAC 360 

5 AAATTTTCGG CGTCTTTTCC GGGTTTATGT CCCCGGTCAC CTTTATGACT GTGTGAAACA 420 

CACCTGCCCA TTGTTTACCC TTGGTCAGTT TTTTCGTCTC CTAGGGTGGG AACATCAAGA 480 

ACAAATTTGC CGAGTAATTG TGCACCTTTT TCCGCGTTAG GACTGCGTTT CACACGTAGA 540 

CAGACTTTTT CTCATTTTCT CACACTCCGT CGTCCGCTTC AGAGCTCTGC GTCTTCGCTG 600 

CCACCATGAA GTACCTGGTC CTCGTTCTCA ACGACGGCAT GAGTCGAATT GAAAAAGCTC 660 

10 TCCTGT6CAG CGATGGTGAG GTGGATTTAG AGTGTCATGA GGTACTTCCC CCTTCTCCCG 720 

CGCCTGTCCC CGCTTCTGTG TCACCCGTGA GGAGTCCTCC TCCTCTGTCT CCGGTGTTTC 780 

CTCCGTCTCC GCCAGCCCCG CTTGTGAATC CAGAGGCGAG TTCGCTGCTG CAGCAGTATC 640 

GGAGAGAGCT GTTAGAGAGG AGCCTGCTCC GAACGGCCGA AGGTCAGCAG CGTGCAGTGT 900 

GTCCATGTGA GCGGTTGCCC GTGGAAGAGG ATGAGTGTCT GAATGCCGTA AATTTGCTGT 960 

15 TTCCTGATCC CTGGCTAAAT GCAGCTGAAA ATGGGGGTGA TATTTTTAAG TCTCCGGCTA 1020 

TGTCTCCAGA ACCGTGGATA GATTTGTCTA GCTACGATAG CGATGTAGAA GAGGTGACTA 1080 

GTCACTTTTT TCTGGATTGC CCTGAAGACC CCAGTCGGGA GTGTTCATCT TGTGGGTTTC 1140 

ATCAGGCTCA AAGCGGAATT CCAGGCATTA TGTGCAGTTT GT6CTACATG CGCCAAACCT 1200 

ACCATTGCAT CTATAGTAAG TACATTCTGT AAAA6AACAT CTTGGTGATT TCTAGGTATT 1260 

20 GTTTAGGGAT TAACTGGGTG GAGTGATCTT AATCCGGCAT AACCAAATAC ATGTTTTCAC 1320 

AGGTCCAGTT TCTGAAGAGG AAATGTGAGT CATGTTGACT TTGGCGCGCA AGAGGAAATG 1380 

TGAGTCATGT TGACTTTGGC GCGCCCTACG GTGACTTTAA AGCAATTTGA GGATCACTTT 1440 

TTTGTTAGTC GCTATAAAGT AGTCACGGAG TCTTC ATG GAT CAC TTA AGC GTT 1493 

Net Asp His Leu Ser Val 
1 5 

25 CTT TTG GAT TTG AAG CT6 CTT CGC TCT ATC GTA GCG GGG GCT TCA AAT 1541 
Leu Leu Asp Leu Lys Leu Leu Arg Ser lie Val Ala Gly Ala Ser Asn 
10 15 20 

CGC ACT GGA GTG TGG AAG AGG CGG CTG TGG CTG GGA CGC CTG ACT CAA 1589 
Arg Thr Gly Val Trp Lys Arg Arg Leu Trp Leu Gly Arg Leu Thr Gin 
25 30 35 

CTG GTC CAT GAT ACC TGC GTA GAG AAC GAG AGC ATA TTT CTC AAT TCT 1637 
Leu Val His Asp Thr Cys Val Glu Asn Glu Ser lie Phe Leu Asn Ser 
3 0 40 45 50 

CTG CCA GGG AAT GAA GCT TTT TTA AGG TTG CTT CGG AGC GGC TAT TTT 1685 
Leu Pro Gly Asn Glu Ala Phe Leu Arg Leu Leu Arg Ser Gly Tyr Phe 
55 60 65 70 

GAA GTG TTT GAC GTG TTT GTG GTG CCT GAG CTG CAT CTG GAC ACT CCG 1733 
Glu Val Phe Asp Val Phe Val Val Pro Glu Leu His Leu Asp Thr Pro 
75 80 85 

35 GGT CGA GTG GTC GCC GCT CTT GCT CTG CTG GTG TTC ATC CTC AAC GAT 1781 
Gly Arg Val Val Ala Ala Leu Ala Leu Leu Val Phe He Leu Asn Asp 
90 95 100 

TTA GAC GCT AAT TCT GCT TCT TCA GGC TTT GAT TCA GGT TTT CTC GTG 1829 
Leu Asp Ala Asn Ser Ala Ser Ser Gly Phe Asp Ser Gly Phe Leu Val 
105 110 115 

GAC CGT CTC TGC GTG CCG CTA TGG CTG AAG GCC AGG GCG TTC AAG ATC 1877 



10 



15 



WO 95/16048 PCT/CA94/00678 

-62- 

Asp Arg Leu Cys Val Pro Leu Trp Leu Lys Ala Arg Ala Phe Lys lie 
120 125 130 

ACC CAG AGC TCC AGG AGC ACT TCG CAG CCT TCC TCG TCG CCC GAC AAG 1925 
Thr Gin Ser Ser Arg Ser Thr Ser Gin Pro Ser Ser Ser Pro Asp Lys 
135 140 145 150 

ACG ACC CAG ACT ACC AGC CAG TAGACGGGGA CAGCCCACCC CGGGCTAGCC 1976 
5 Thr Thr Gin Thr Thr Ser Gin 
155 

TGGAGGAGGC TGAACAGAGC AGCACTCGTT TCGAGCACAT CAGTTACCGA GACGTGGTGG 2036 

ATGACTTCAA TAGATGCCAT GATGTTTTTT ATGAGAGGTA CAGTTTTGAG GACATAAAGA 2096 

GCTACGAGGC TTTGCCTGAG GACAATTTGG AGCAGCTCAT AGCTATGCAT GCTAAAATCA 2156 

AGCTGCTGCC CGGTCGGGAG TATGAGTTGA CTCAACCTTT GAACATAACA TCTTGCGCCT 2216 

ATGTGCTCGG AAATGGGGCT ACTATTAGGG TAACAGGGGA AGCCTCCCCG GCTATTAGAG 2276 

TGGGGGCCAT GGCCGTGGGT CCGTGTGTAA CAGGAATGAC TGGGGTGACT TTTGTGAATT 2336 

GTAGGTTTGA GAGAGAGTCA ACAATTAGGG GGTCCCTGAT ACGAGCTTCA ACTCACGTGC 2396 

TGTTTCATGG CTGTTATTTT ATGGGAATTA TGGGCACTTG TATTGAGGTG GGGGCGGGAG 2456 

CTTACATTCG GGGTT6TGAG TTTGTGGGCT GTTACCGGGG AATCTGTTCT ACTTCTAACA 2516 

GAGATATTAA GGTGAGGCAG TGCAACTTTG ACAAATGCTT ACTGGGTATT ACTTGTAAGG 2576 

6GGACTATCG TCTTTCGGGA AATGTGTGTT CTGAGACTTT CTGCTTTGCT CATTTAGAGG 2636 

GAGAGGGTTT GGTTAAAAAC AACACAGTCA AGTCCCCTAG TCGCTGGACC AGCGAGTCTG 2696 

GCTTTTCCAT GATAACTTGT GCAGACGGCA GGGTTACGCC TTTGGGTTCC CTCCACATTG 2756 

TGGGCAACCG TTGTAGGCGT TGGCCAACCA TGCAG6GGAA TGTGTTTATC ATGTCTAAAC 2816 

TGTATCTGGG CAACAGAATA GGGACTGTAG CCCTGCCCCA GTGTGCTTTC TACAAGTCCA 2876 

GCATTTGTTT GGAGGAGAGG GCGACAAACA AGCTGGTCTT GGCTTGTGCT TTTGAGAATA 2936 

ATGTACTGGT GTACAAAGTG CT6AGACGGG AGAGTCCCTC AACCGTGAAA ATGTGTGTTT 2996 

GTGGGACTTC TCATTATGCA AA6CCTTTGA CACTGGCAAT TATTTCTTCA GATATTCGGG 3056 

CTAATCGATA CATGTACACT GTGGACTCAA CAGAGTTCAC TTCTGACGAG GATTAAAAGT 3116 

GGGCGGGGCC AAGAGGGGTA TAAATAGGTG GGGAGGTTGA GGGGAGCCGT AGTTTCTGTT 3176 

TTTCCCAGAC TGGGGGGGAC AACATGGCCG AGGAAGGGCG CATTTATGTG CCTTATGTAA 3236 

CTGCCCGCCT GCCCAAGTGG TCGGGTTCGG TGCAGGATAA GACGGGCTCG AACATGTTGG 3296 

GGGGTGTGGT ACTCCCTCCT AATTCACAGG CGCACCGGAC GGAGACCGTG GGCACTGAGG 3356 

CCACCAGAGA CAACCTGCAC GCCGAGGGAG CGCGTCGTCC TGAGGATCAG ACGCCCTACA 3416 

TGATCTTGGT GGAGGACTCT CTGGGAGGTT TGAAGAGGCG AATGGACTTG CTGGAAGAAT 3476 

CTAATCAGCA GCTGCTGGCA ACTCTCAACC GTCTCCGTAC AGGACTCGCT GCCTATGTGC 3536 

AGGCTAACCT TGTGGGCGGC CAAGTTAACC CCTTTGTTTA AATAAAAATA CACTCATACA 3596 
GTTTATTATG CTGTCAATAA AATTCTTTAT TTTTCCTGTG ATAATACCGT GTCCAGCGTG ' • 3656 

CTCTGTCAAT AAGGGTCCTA TGCATCCTGA GAAGGGCCTC ATATACCCAT GGCATGAATA 3716 

TTAAGATACA TGGGCATAAG GCCCTCAGAA GGGTTGAGGT AGAGCCACTG CAGACTTTCG 3776 

TGGGGAGGTA AGGTGTTGTA AATAATCCAG TCATACTGAC TGTGCTGGGC GTGGAAGGAA 3836 

AAGATGTCTT TTAGAAGAAG GGTGATTGGC AAAGGGAGGC TCTTAGTGTA GGTATTGATA 3896 

AATCTGTTCA GTTGGGAGGG ATGCATTCGG GGGCTAATAA GGTGGAGTTT AGCCTGAATC 3956 
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TTAAGGTTGG CAATGTTGCC CCCTAGGTCT TTGCGAGGAT TCATGTTGTG CAGTACCACA 4016 

AAAACAGAGT AGCCTGTGCA TTTGGGGAAT TTATCATGAA GCTT 4060 

(2) INFORMATION FOR SEQ ID N0:4: 

(i) SEQUENCE CHARACTERISTICS: 
5 (A) LENGTH: 157 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(Xf) SEQUENCE DESCRIPTION: SEQ ID N0:4: 

Met Asp His Leu Ser Val Leu Leu Asp Leu Lys Leu Leu Arg Ser lie 
1 5 10 15 

Val Ala Gly Ala Ser Asn Arg Thr Gly Val Trp Lys Arg Arg Leu Trp 
20 25 30 

Leu Gly Arg Leu Thr Gin Leu Val His Asp Thr Cys Val Glu Asn Glu 
35 40 45 

Ser ] le Phe Leu Asn Ser Leu Pro Gly Asn Glu Ala Phe Leu Arg Leu 
50 55 60 

15 Leu Arg Ser Gly Tyr Phe Glu Val Phe Asp Val Phe Val Val Pro Glu 
65 70 75 80 

Leu His Leu Asp Thr Pro Gly Arg Val Val Ala Ala Leu Ala Leu Leu 
85 90 95 

Val Phe He Leu Asn Asp Leu Asp Ala Asn Ser Ala Ser Ser Gly Phe 
100 105 110 

Asp Ser Gly Phe Leu Val Asp Arg Leu Cys Val Pro Leu Trp Leu Lys 
20 115 120 125 

Ala Arg Ala Phe Lys lie Thr Gin Ser Ser Arg Ser Thr Ser Gin Pro 
130 135 140 

Ser Ser Ser Pro Asp Lys Thr Thr Gin Thr Thr Ser Gin 
145 150 155 

(2) INFORMATION FOR SEQ ID NO: 5: 

25 (D SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4060 base pairs 

(B) TYPE: nucleic acid 

(C) ST RAND ED NESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(ix) FEATURE: 
30 (A) NAME/KEY: CDS 

(B) LOCATION: 1850.. 3109 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:5: 

CATCATCAAT AATCTACAGT ACACTGATGG CAGCGGTCCA ACTGCCAATC ATTTTTGCCA 60 

CGTCATTTAT GACGCAACGA CGGCGAGCGT GGCGTGCTGA CGTAACTGTG GGGCGGAGCG 120 

35 CGTCGCGGAG GCGGCGGCGC TGGGCGGGGC TGAGGGCGGC GGGGGCGGCG CGCGGGGCGG 180 

CGCGCGGGGC GGGGCGAGGG GCGGAGTTCC GCACCCGCTA CGTCATTTTC AGACATTTTT 240 

TAGCAAATTT GCGCCTTTTG CAAGCATTTT TCTCACATTT CAGGTATTTA GAGGGCGGAT 300 

TTTTGGTGTT CGTACTTCCG TGTCACATAG TTCACTGTCA ATCTTCATTA CGGCTTAGAC 360 

AAATTTTCGG CGTCTTTTCC GGGTTTATGT CCCCGGTCAC CTTTATGACT GTGTGAAACA 420 
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CACCTGCCCA TTGTTTACCC TTGGTCAGTT TTTTCGTCTC CTAGGGTGGG AACATCAAGA 480 

ACAAATTTGC CGAGTAATTG TGCACCTTTT TCCGCGTTAG GACTGCGTTT CACACGTAGA 540 

CAGACTTTTT CTCATTTTCT CACACTCCGT CGTCCGCTTC AGAGCTCTGC GTCTTCGCTG 600 

CCACCATGAA GTACCTGGTC CTCGTTCTCA ACGACGGCAT GAGTCGAATT GAAAAAGCTC 660 

5 TCCTGTGCAG CGATGGTGAG GTGGATTTAG AGTGTCATGA GGTACTTCCC CCTTCTCCCG 720 

C6CCTGTCCC CGCTTCTGTG TCACCCGTGA GGAGTCCTCC TCCTCTGTCT CCGGTGTTTC 780 

CTCCGTCTCC GCCAGCCCCG CTTGTGAATC CAGAGGCGAG TTCGCTGCTG CAGCAGTATC 840 

GGAGAGAGCT GTTAGAGAGG AGCCTGCTCC GAACGGCCGA AGGTCAGCAG CGTGCAGTGT 900 

GTCCATGTGA GCGGTTGCCC GTGGAAGAGG ATGAGTGTCT GAATGCCGTA AATTTGCTGT 960 

10 TTCCTGATCC CTGGCTAAAT GCAGCTGAAA ATGGGGGTGA TATTTTTAAG TCTCCGGCTA 1020 

TGTCTCCAGA ACCGTGGATA GATTTGTCTA GCTACGATAG CGATGTAGAA GAGGTGACTA 1080 

GTCACTTTTT TCTGGATTGC CCTGAAGACC CCAGTCGGGA GTGTTCATCT TGTGGGTTTC 1140 

ATCAGGCTCA AAGCGGAATT CCAG6CATTA TGTGCAGTTT GTGCTACATG CGCCAAACCT 1200 

ACCATTGCAT CTATAGTAAG TACATTCTGT AAAAGAACAT CTTGGTGATT TCTAGGTATT 1260 

15 GTTTAGGGAT TAACTGGGTG GAGTGATCTT AATCCGGCAT AACCAAATAC ATGTTTTCAC 1320 

AGGTCCAGTT TCTGAAGAGG AAATGTGAGT CATGTTGACT TTGGCGCGCA AGAGGAAATG 1380 

TGAGTCATGT TGACTTTGGC GCGCCCTACG GTGACTTTAA AGCAATTTGA GGATCACTTT 1440 

TTTGTTAGTC GCTATAAAGT AGTCACGGAG TCTTCATGGA TCACTTAAGC GTTCTTTTGG 1500 

ATTTGAAGCT GCTTCGCTCT ATCGTAGCGG GGGCTTCAAA TCGCACTGGA GTGTGGAAGA 1560 

2 0 GGCGGCTGTG GCTGGGACGC CTGACTCAAC TGGTCCATGA TACCTGCGTA GAGAACGAGA 1620 

GCATATTTCT CAATTCTCTG CCAGGGAATG AAGCTTTTTT AAGGTTGCTT CGGAGCGGCT 1680 

ATTTTGAAGT GTTTGACGTG TTTGTGGTGC CTGAGCTGCA TCTGGACACT CCGGGTCGAG 1740 

TGGTCGCCGC TCTTGCTCTG CTGGTGTTCA TCCTCAACGA TTTAGACGCT AATTCTGCTT 1800 

CTTCAGGCTT TGATTCAGGT TTTCTCGTGG ACCGTCTCTG CGTGCCGCT ATG GCT 1855 

Met Ala 

25 1 

GAA GGC CAG GGC GTT CAA GAT CAC CCA GAG CTC CAG GAG CAC TTC GCA 1903 
Glu Gly Gin Gly Val Gin Asp His Pro Glu Leu Gin Glu His Phe Ala 
5 10 15 

GCC TTC CTC GTC GCC CGA CAA GAC GAC CCA GAC TAC CAG CCA GTA GAC 1951 
Ala Phe Leu Val Ala Arg Gin Asp Asp Pro Asp Tyr Gin Pro Val Asp 
20 25 30 

3 0 GGG GAC AGC CCA CCC CGG GCT AGC CTG GAG GAG GCT GAA CAG AGC AGC 1999 

Gly Asp Ser Pro Pro Arg Ala Ser Leu Glu Glu Ala Glu Gin Ser Ser 
35 40 45 50 

ACT CGT TTC GAG CAC ATC AGT TAC CGA GAC GTG GTG GAT GAC TTC AAT 2047 
Thr Arg Phe Glu His lie Ser Tyr Arg Asp Val Val Asp Asp Phe Asn 
.55 60 65 

AGA TGC CAT GAT GTT TTT TAT GAG AGG TAC AGT TTT GAG GAC ATA AAG 2095 
Arg Cys His Asp Val Phe Tyr Glu Arg Tyr Ser Phe Glu Asp lie Lys 
35 70 75 80 

AGC TAC GAG GCT TTG CCT GAG GAC AAT TTG GAG CAG CTC ATA GCT ATG 2143 
Ser Tyr Glu Ala Leu Pro Glu Asp Asn Leu Glu Gin Leu He Ala Met 
85 90 95 

CAT GCT AAA ATC AAG CTG CTG CCC GGT CGG GAG TAT GAG TTG ACT CAA 2191 
His Ala Lys lie Lys Leu Leu Pro Gly Arg Glu Tyr Glu Leu Thr Gin 
100 105 110 
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CCT TTG AAC ATA ACA TCT TGC GCC TAT GTG CTC GGA AAT GGG GCT ACT 2239 
Pro Leu Asn He Thr Ser Cys Ala Tyr Vel Leu Gly Asn Gly Ala Thr 
115 120 125 130 

ATT AGG GTA ACA GGG GAA GCC TCC CCG GCT ATT AGA GTG GGG GCC ATG 2287 
lie Arg Val Thr Gly Glu Ala Ser Pro Ala lie Arg Val Gly Ala Met 
135 140 145 

5 GCC GTG GGT CCG TGT GTA ACA GGA ATG ACT GGG GTG ACT TTT GTG AAT 2335 
Ala Val Gly Pro Cys Val Thr Gly Met Thr Gly Val Thr Phe Val Asn 
150 155 160 

TGT AGG TTT GAG AGA GAG TCA ACA ATT AGG GGG TCC CTG ATA CGA GCT 2383 
Cys Arg Phe Glu Arg Glu Ser Thr lie Arg Gly Ser Leu He Arg Ala 
165 170 175 

TCA ACT CAC GTG CTG TTT CAT GGC TGT TAT TTT ATG GGA ATT ATG GGC 2431 
Ser Thr His Val Leu Phe His Gly Cys Tyr Phe Met Gly lie Met Gly 
10 180 185 190 

ACT TGT ATT GAG GTG GGG GCG GGA GCT TAC ATT CGG GGT TGT GAG TTT 2479 
Thr Cys lie Glu Val Gly Ala Gly Ala Tyr He Arg Gly Cys Glu Phe 
195 200 205 210 

GTG GGC TGT TAC CGG GGA ATC TGT TCT ACT TCT AAC AGA GAT ATT AAG 2527 
Val Gly Cys Tyr Arg Gly He Cys Ser Thr Ser Asn Arg Asp He Lys 
215 220 225 

15 GTG AGG CAG TGC AAC TTT GAC AAA TGC TTA CTG GGT ATT ACT TGT AAG 2575 
Val Arg Gin Cys Asn Phe Asp Lys Cys Leu Leu Gly He Thr Cys Lys 
230 235 240 

GGG GAC TAT CGT CTT TCG GGA AAT GTG TGT TCT GAG ACT TTC TGC TTT 2623 
Gly Asp Tyr Arg Leu Ser Gly Asn Val Cys Ser Glu Thr Phe Cys Phe 
245 250 255 

GCT CAT TTA GAG GGA GAG GGT TTG GTT AAA AAC AAC ACA GTC AAG TCC 2671 
Ala His Leu Glu Gly Glu Gly Leu Val Lys Asn Asn Thr Val Lys Ser 
20 260 265 270 

CCT AGT CGC T6G ACC AGC GAG TCT GGC TTT TCC ATG ATA ACT TGT GCA 2719 
Pro Ser Arg Trp Thr Ser Glu Ser Gly Phe Ser Met He Thr Cys Ala 
275 280 285 290 

GAC GGC AGG GTT ACG CCT TTG GGT TCC CTC CAC ATT GTG GGC AAC CGT 2767 
Asp Gly Arg Val Thr Pro Leu Gly Ser Leu His He Val Gly Asn Arg 
295 300 305 

25 TGT AGG CGT TGG CCA ACC ATG CAG GGG AAT GTG TTT ATC ATG TCT AAA 2815 
Cys Arg Arg Trp Pro Thr Met Gin Gly Asn Val Phe He Met Ser Lys 
310 315 320 

CTG TAT CTG GGC AAC AGA ATA GGG ACT GTA GCC CTG CCC CAG TGT GCT 2863 
Leu Tyr Leu Gly Asn Arg He Gly Thr Val Ala Leu Pro Gin Cys Ala 
325 330 335 

TTC TAC AAG TCC AGC ATT TGT TTG GAG GAG AGG GCG ACA AAC AAG CTG 2911 
Phe Tyr Lys Ser Ser He Cys Leu Glu Glu Arg Ala Thr Asn Lys Leu 
30 340 345 350 

GTC TTG GCT TGT GCT TTT GAG AAT AAT GTA CTG GTG TAC AAA GTG CTG 2959 
Val Leu Ala Cys Ala Phe Glu Asn Asn Val Leu Val Tyr Lys Val Leu 
355 360 365 370 

AGA CGG GAG AGT CCC TCA ACC GTG AAA ATG TGT GTT TGT GGG ACT TCT 3007 
Arg Arg Glu Ser Pro Ser Thr Val Lys Met Cys Val Cys Gly Thr Ser 
375 380 385 

.35 CAT TAT GCA AAG CCT TTG ACA CTG GCA ATT ATT TCT TCA GAT ATT CGG 3055 
His Tyr Ala Lys Pro Leu Thr Leu Ala He He Ser Ser Asp He Arg 
390 395 400 

GCT AAT CGA TAC ATG TAC ACT GTG GAC TCA ACA GAG TTC ACT TCT GAC 3103 
Ala Asn Arg Tyr Met Tyr Thr Val Asp Ser Thr Glu Phe Thr Ser Asp 
405 410 415 

GAG GAT TAAAAGTGGG CGGGGCCAAG AGGGGTATAA ATAGGTGGGG AGGTTGAGGG 3159 



r 
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Glu Asp 
420 

GAGCCGTAGT TTCTGTTTTT CCCAGACTGG GGGGGACAAC ATGGCCGAGG AAGGGCGCAT 3219 

TTATGTGCCT TATGTAACTG CCCGCCTGCC CAAGTGGTCG GGTTCGGTGC AG6ATAAGAC 3279 

GGGCTCGAAC ATGTTGGGGG GTGTGGTACT CCCTCCTAAT TCACAGGCGC ACCGGACGGA 3339 

5 

GACCGTGGGC ACTGAGGCCA CCAGAGACAA CCTGCACGCC GAGGGAGCGC GTCGTCCTGA 3399 

GGATCAGACG CCCTACATGA TCTTGGTGGA GGACTCTCTG GGAGGTTT6A AGAGGCGAAT 3459 

GGACTTGCTG GAAGAATCTA ATCAGCAGCT GCTGGCAACT CTCAACCGTC TCCGTACAGG 3519 

ACTCGCTGCC TATGTGCAGG CTAACCTTGT GGGCGGCCAA GTTAACCCCT TTGTTTAAAT 3579 

AAAAATACAC TCATACAGTT TATTATGCTG TCAATAAAAT TCTTTATTTT TCCTGTGATA 3639 

ATACCGTGTC CAGCGTGCTC TGTCAATAAG GGTCCTATGC ATCCTGAGAA GGGCCTCATA 3699 

TACCCATGGC ATGAATATTA AGATACATGG GCATAAGGCC CTCAGAAGGG TTGAGGTAGA 3759 

GCCACT6CAG ACTTTCGTGG GGAGGTAAGG TGTTGTAAAT AATCCAGTCA TACTGACTGT 3819 

GCTGG6CGTG GAAGGAAAAG ATGTCTTTTA GAAGAAGGGT GATTGGCAAA GGGAGGCTCT 3879 

TAGTGTAGGT ATTGATAAAT CTGTTCAGTT GGGAGGGATG CATTCGGGGG CTAATAAGGT 3939 

GGAGTTTAGC CTGAATCTTA AGGTTGGCAA TGTTGCCCCC TAGGTCTTTG CGAGGATTCA 3999 

TGTTGTGCAG TACCACAAAA ACAGAGTAGC CTGTGCATTT GGGGAATTTA TCATGAAGCT 4059 
T 



10 



15 



30 



4060 



(2) INFORMATION FOR SEQ ID N0:6: 

20 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 420 amino acids 

(B) TYPE: amino acid 
<D> TOPOLOGY: linear 

<ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:6: 

Met Ala Glu Gly Gin Gly Val Gin Asp His Pro Glu Leu Gin Glu His 
25 1 5 10 15 

Phe Ala Ala Phe Leu Val Ala Arg Gin Asp Asp Pro Asp Tyr Gin Pro 
20 25 30 

Val Asp Gly Asp Ser Pro Pro Arg Ala Ser Leu Glu Glu Ala Glu Gin 
35 40 45 



Ser Ser Thr Arg Phe Glu His He Ser Tyr Arg Asp Val Val Asp Asp 
50 55 60 

Phe Asn Arg Cys His Asp Val Phe Tyr Glu Arg Tyr Ser Phe Glu Asp 
65 70 75 80 

He Lys Ser Tyr Glu Ala Leu Pro Glu Asp Asn Leu Glu Gin Leu He 
85 90 95 

Ala Met His Ala Lys He Lys Leu Leu Pro Gly Arg Glu Tyr Glu Leu 
100 105 110 

35 Thr Gin Pro Leu Asn He Thr Ser Cys Ala Tyr Val Leu Gly Asn Glv 
115 120 125 

Ala Thr He Arg Val Thr Gly Glu Ala Ser Pro Ala He Arg Val Glv 
130 135 140 

Ala Met Ala Val Gly Pro Cys Val Thr Gly Met Thr Gly Val Thr Phe 
H5 150 155 160 
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Val Asn Cys Arg Phe Gtu Arg Glu Ser Thr lie Arg Gty Ser Leu lie 
165 170 175 

Arg Ala Ser Thr His Val Leu Phe His Gly Cys Tyr Phe Met Gly lie 
180 185 190 

Met Gly Thr Cys He Glu Val Gly Ala Gly Ala Tyr lie Arg Gly Cys 
195 200 205 

5 

Glu Phe Val Gly Cys Tyr Arg Gly lie Cys Ser Thr Ser Asn Arg Asp 
210 215 220 

He Lys Val Arg Gin Cys Asn Phe Asp Lys Cys Leu Leu Gly He Thr 
225 230 235 240 

Cys Lys Gly Asp Tyr Arg Leu Ser Gly Asn Val Cys Ser Glu Thr Phe 
245 250 255 

10 Cys Phe Ala His Leu Glu Gly Glu Gly Leu Val Lys Asn Asn Thr Val 
260 265 270 

Lys Ser Pro Ser Arg Trp Thr Ser Glu Ser Gly Phe Ser Met lie Thr 
275 280 285 

Cys Ala Asp Gly Arg Val Thr Pro Leu Gly Ser Leu His lie Val Gly 
290 295 300 

Asn Arg Cys Arg Arg Trp Pro Thr Met Gin Gly Asn Val Phe lie Met 
15 305 310 315 320 

Ser Lys Leu Tyr Leu Gly Asn Arg lie Gly Thr Val Ala Leu Pro Gin 
325 330 335 

Cys Ala Phe Tyr Lys Ser Ser He Cys Leu Glu Glu Arg Ala Thr Asn 
340 345 350 



20 



Lys Leu Val Leu Ala Cys Ala Phe Glu Asn Asn Val Leu Val Tyr Lys 
355 360 365 

Val Leu Arg Arg Glu Ser Pro Ser Thr Val Lys Met Cys Val Cys Gly 
370 375 380 

Thr Ser His Tyr Ala Lys Pro Leu Thr Leu Ala He He Ser Ser Asp 
385 390 395 400 

He Arg Ala Asn Arg Tyr Met Tyr Thr Val Asp Ser Thr Glu Phe Thr 
405 410 415 

25 Ser Asp Glu Asp 
420 

(2) INFORMATION FOR SEQ ID NO:7: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4060 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



30 



35 



(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 3200.. 3574 



(Xi) SEQUENCE DESCRIPTION: SEQ ID N0:7: 

CATCATCAAT AATCTACAGT ACACTGATGG CAGCGGTCCA ACTGCCAATC ATTTTTGCCA 60 

CGTCATTTAT GACGCAACGA CGGCGAGCGT GGCGTGCTGA CGTAACTGTG GGGCGGAGCG 120 

CGTCGCGGAG GCGGCGGCGC TGGGCGGGGC TGAGGGCGGC GGGGGCGGCG CGCGGGGCGG 180 

CGCGCGGGGC GGGGCGAGGG GCGGAGTTCC GCACCCGCTA CGTCATTTTC AGACATTTTT 240 
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TAGCAAATTT GCCCCTTTTG CAAGCATTTT TCTCACATTT CAGGTATTTA GAGGGCGGAT 300 

TTTTGGTGTT CGTACTTCCG TGTCACATAG TTCACTGTCA ATCTTCATTA CGGCTTAGAC 360 

AAATTTTCGG CGTCTTTTCC GGGTTTATGT CCCCGGTCAC CTTTATGACT GTGTGAAACA 420 

CACCTGCCCA TTGTTTACCC TTGGTCAGTT TTTTCGTCTC CTAGGGTGGG AACATCAAGA 480 

ACAAATTTGC CGAGTAATTG TCCACCTTTT TCCGCGTTAG GACTGCGTTT CACACGTAGA 540 

CAGACTTTTT CTCATTTTCT CACACTCCGT CGTCCGCTTC AGAGCTCTGC GTCTTCGCTG 600 

CCACCATGAA GTACCTGGTC CTCGTTCTCA ACGACGGCAT GAGTC6AATT GAAAAAGCTC 660 

TCCTGTGCAG CGATGGTGAG GTGGATTTAG AGTGTCATGA GGTACTTCCC CCTTCTCCCG 720 

CGCCTGTCCC CGCTTCTGTG TCACCCGTGA GGAGTCCTCC TCCTCTGTCT CCGGTGTTTC 780 

CTCCGTCTCC GCCAGCCCCG CTTGTGAATC CAGAGGCGAG TTCGCTGCTG CAGCAGTATC 840 

GGAGAGAGCT GTTAGAGAGG AGCCTGCTCC GAACGGCCGA AGGTCAGCAG CGTGCAGTGT 900 

GTCCATGTGA GCGGTTGCCC GTGGAAGAGG ATGAGTGTCT GAATGCCGTA AATTTGCTGT 960 

TTCCTGATCC CTGGCTAAAT GCAGCTGAAA ATGGGGGTGA TATTTTTAAG TCTCCGGCTA 1020 

TGTCTCCAGA ACCGTGGATA GATTTGTCTA GCTACGATAG CGATGTAGAA GAGGTGACTA 1080 

GTCACTTTTT TCTGGATTGC CCTGAAGACC CCAGTCGGGA GTGTTCATCT TGTGGGTTTC 1140 

ATCAGGCTCA AAGCGGAATT CCAGGCATTA TGTGCAGTTT GTGCTACATG CGCCAAACCT 1200 

ACCATTGCAT CTATAGTAAG TACATTCTGT AAAAGAACAT CTTGGTGATT TCTAGGTATT 1260 

GTTTAGGGAT TAACTGGGTG GAGTGATCTT AATCCGGCAT AACCAAATAC ATGTTTTCAC 1320 

AGGTCCAGTT TCTGAAGAGG AAATGTGAGT CATGTTGACT TTGGCGCGCA AGAGGAAATG 1380 

TGAGTCATGT TGACTTTGGC GCGCCCTACG GTGACTTTAA AGCAATTTGA GGATCACTTT 1440 

TTTGTTAGTC GCTATAAAGT AGTCACGGAG TCTTCATGGA TCACTTAAGC GTTCTTTTGG 1500 

ATTTGAAGCT GCTTCGCTCT ATCGTAGCGG GGGCTTCAAA TCGCACTGGA GT6TGGAAGA 1560 

GGCGGCTGTG GCTGGGACGC CTGACTCAAC TGGTCCATGA TACCTGCGTA GAGAACGAGA 1620 

GCATATTTCT CAATTCTCTG CCAGGGAATG AAGCTTTTTT AAGGTTCCTT CGGAGCGGCT 1680 

ATTTTGAAGT GTTTGACGTG TTTGTGGTGC CTGAGCTGCA TCTGGACACT CCGGGTCGAG 1740 

TGGTCGCCGC TCTTGCTCTG CTGGTGTTCA TCCTCAACGA TTTAGACGCT AATTCTGCTT 1800 

CTTCAGGCTT TGATTCAGGT TTTCTCGTGG ACCGTCTCTG CGTGCCGCTA TGGCTGAAGG 1860 

CCAGGGCGTT CAAGATCACC CAGAGCTCCA GGAGCACTTC GCAGCCTTCC TCGTCGCCCG 1920 

ACAAGACGAC CCAGACTACC AGCCAGTAGA CGGGGACAGC CCACCCCGGG CTAGCCTGGA 1980 

GGAGGCTGAA CAGAGCAGCA CTCGTTTCGA GCACATCAGT TACCGAGACG TGGTGGATGA 2040 

CTTCAATAGA TGCCATGATG TTTTTTATGA GAGGTACAGT TTTGAGGACA TAAAGAGCTA 2100 

CGAGGCTTTG CCTGAGGACA ATTTGGAGCA GCTCATAGCT ATGCATGCTA AAATCAAGCT 2160 

GCT6CCCGGT CGGGAGTATG AGTTGACTCA ACCTTTGAAC ATAACATCTT GCGCCTATGT 2220 

GCTCGGAAAT GGGGCTACTA TTAGGGTAAC AGGGGAAGCC TCCCCGGCTA TTAGAGTGGG 2280 

GGCCATGGCC GTGGGTCCGT GTGTAACAGG AATGACTGGG GTGACTTTTG TGAATTGTAG 2340 

GTTTGAGAGA GAGTCAACAA TTAGGGGGTC CCTGATACGA GCTTCAACTC ACGTGCTGTT 2400 

TCAT6GCTGT TATTTTATGG GAATTATGGG CACTT6TATT GAGGTGGGGG CGGGAGCTTA 2460 

CATTCGGGGT TGTGAGTTTG TGGGCTGTTA CCGGGGAATC TGTTCTACTT CTAACAGAGA 2520 

TATTAAGGTG AGGCAGTGCA ACTTTGACAA ATGCTTACTG GGTATTACTT GTAAGGGGGA 2580 
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CTATCGTCTT 


TCGGGAAATG 


TGTGTTCTGA GACTTTCTGC TTTGCTCATT 


TAGAGGGAGA 


2640 


GGGTTT6GTT 


AAAAACAACA 


CAGTCAAGTC CCCTAGTCGC TGGACCAGCG 


AGTCTGGCTT 


2700 


TTCCATGATA 


ACTTGTGCAG 


ACGGCAGGGT TACGCCTTTG GGTTCCCTCC 


ACATTGTGGG 


2760 


CAACCGTTGT 


AGGCGTTGGC 


CAACCATGCA GGGGAATGTG TTTATCATGT 


CTAAACTGTA 


2820 


TCTGGGCAAC 


AGAATAGGGA 


CTGTAGCCCT GCCCCAGTGT 6CTTTCTACA 


AGTCCAGCAT 


2880 


TTGTTTGGAG 


GAGAGGGCGA 


CAAACAAGCT GGTCTTGGCT TGTGCTTTTG 


AGAATAATGT 




ACTGGTGTAC 


AAAGTGCTGA 


GACGGGAGAG TCCCTCAACC GTGAAAATGT 


GTGTTTGTGG 


3000 


GACTTCTCAT 


TATGCAAAGC 


CTTTGACACT GGCAATTATT TCTTCAGATA 


TTCGGGCTAA 


3060 


TCGATACATG 


TACACTGTGG 


ACTCAACAGA GTTCACTTCT GACGAGGATT 


AAAAGTGGGC 


3120 


GGGGCCAAGA 


GGGGTATAAA 


TAGGTGGGGA GGTTGAGGGG AGCCGTAGTT 


TCTGTTTTTC 


3180 


CCAGACTGGG 


GGGGACAAC ATG GCC GAG GAA GGG CGC ATT TAT GTG CCT TAT 


3232 



Met Ala Glu Glu Gly Arg lie Tyr Val Pro Tyr 
1 5 10 

GTA ACT GCC CGC CTG CCC AAG TGG TCG GGT TCG GTG CAG GAT AAG ACG 3280 
Val Thr Ala Arg Leu Pro Lys Trp Ser Gly Ser Val Gin Asp Lys Thr 
15 20 25 

GGC TCG AAC ATG TTG GGG GGT GTG GTA CTC CCT CCT AAT TCA CAG GCG 3328 
Gly Ser Asn Met Leu Gly Gly Val Val Leu Pro Pro Asn Ser Gin Ala 
30 35 40 

CAC CGG ACG GAG ACC GTG GGC ACT GAG GCC ACC AGA GAC AAC CTG CAC 3376 
His Arg Thr Glu Thr Val Gly Thr Glu Ala Thr Arg Asp Asn Leu His 
45 50 55 

GCC GAG GGA GCG CGT CGT CCT GAG GAT CAG ACG CCC TAC ATG ATC TTG 3424 
Ala Glu Gly Ala Arg Arg Pro Glu Asp Gin Thr Pro Tyr Met lie Leu 
60 65 70 75 

GTG GAG GAC TCT CTG GGA GGT TTG AAG AGG CGA ATG GAC TTG CTG GAA 3472 
Val Glu Asp Ser Leu Gly Gly Leu Lys Arg Arg Met Asp Leu Leu Glu 
80 85 90 

GAA TCT AAT CAG CAG CTG CTG GCA ACT CTC AAC CGT CTC CGT ACA GGA 3520 
Glu Ser Asn Gin Gin Leu Leu Ala Thr Leu Asn Arg Leu Arg Thr Gly 
95 100 105 

CTC GCT GCC TAT GTG CAG GCT AAC CTT GTG GGC GGC CAA GTT AAC CCC 3568 
Leu Ala Ala Tyr Val Gin Ala Asn Leu Val Gly Gly Gin Val Asn Pro 
110 115 120 

TTT GTT TAAATAAAAA TACACTCATA CAGTTTATTA TGCTGTCAAT AAAATTCTTT 3624 
Phe Val 
125 



ATTTTTCCTG TGATAATACC GTGTCCAGCG 


T GCT CTG TCA ATAAGGGTCC TATGCATCCT 


3684 


GAGAAGGGCC TCATATACCC ATGGCATGAA 


TATTAAGATA CATGGGCATA AGGCCCTCAG 


3744 


AAGGGTTGAG GTAGAGCCAC TGCAGACTTT 


CGTGGGGAGG TAAGGTGTTG TAAATAATCC 


3804 


AGTCATACTG ACTGTGCTGG GCGTGGAAGG 


AAAAGATGTC TTTTAGAAGA AGGGTGATTG 


3864 


GCAAAGGGAG GCTCTTAGTG TAGGTATTGA 


TAAATCTGTT CAGTTGGGAG GGATGCATTC 


3924 


GGGGGCTAAT AAGGTGGAGT TTAGCCTGAA 


TCTTAAGGTT GGCAATGTTG CCCCCTAGGT 


3984 


CTTTGCGAGG ATTCATGTTG TGCAGTACCA 


CAAAAACAGA GTAGCCTGTG CATTTGGGGA 


4044 


ATTTATCATG AAGCTT 




4060 



(2) INFORMATION FOR SEQ ID N0:8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 125 amino acids 
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(B) TYPE: amino acid 
(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:8: 

Met Ala Glu Glu Gly Arg He Tyr Val Pro Tyr Val Thr Ala Arg Leu 
5 15 10 15 

Pro Lys Trp Ser Gly Ser Val Gin Asp Lys Thr Gly Ser Asn Met Leu 
20 25 30 

Gly Gly Val Val Leu Pro Pro Asn Ser Gin Ala His Arg Thr Glu Thr 
35 40 45 

Val Gly Thr Glu Ala Thr Arg Asp Asn Leu His Ala Glu Gly Ala Arg 
50 55 60 

Arg Pro Glu Asp Gin Thr Pro Tyr Het lie Leu Val Glu Asp Ser Leu 
65 70 75 80 

Gly Gly Leu Lys Arg Arg Met Asp Leu Leu Glu Glu Ser Asn Gin Gin 
85 90 95 

Leu Leu Ala Thr Leu Asn Arg Leu Arg Thr Gly Leu Ala Ala Tyr Val 
100 105 110 

15 Gin Ala Asn Leu Val Gly Gly Gin Val Asn Pro Phe Val 
115 120 125 

(2) INFORMATION FOR SEQ ID N0:9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 54 amino acids 

(B) TYPE: amino acid 
(C> STRANDEDNESS: single 
<D> TOPOLOGY: linear 



10 



20 



(ii) MOLECULE TYPE: peptide 



(Xi) SEQUENCE DESCRIPTION: SEQ ID N0:9: 

Glu Glu Phe Val Leu Asp Tyr Val Glu His Pro Gly His Gly Cys Arg 
1 5 10 15 

25 Ser Cys His Tyr His Arg Arg Asn Thr Gly Asp Pro Asp lie Met Cys 

20 25 30 

Ser Leu Cys Tyr Met Arg Thr Cys Gly Met Phe Val Tyr Ser Pro Val 
35 40 45 

Ser Glu Pro Glu Pro Glu 
50 

(2) INFORMATION FOR SEQ ID N0:10: 

30 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

He Asp Leu Thr Cys His Glu Ala Gly Phe Pro Pro Ser 
1 5 10 

(2) INFORMATION FOR SEQ ID NO:11: 

<i) SEQUENCE CHARACTERISTICS: 
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35 



(A) LENGTH: 19 amino acids 

(B) TYPE: amino acid 

(C) STRAND EDNESS : single 
(0) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEO ID N0:11: 



Leu Asp Phe Ser Thr Pro Gly Arg Ala Ala Ala Ala Val Ala Phe Leu 
15 10 15 

Ser Phe I le 



(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: amino acid 

(C) ST RAND EDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
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(xi) SEQUENCE DESCRIPTION: SEQ ID N0:12: 

Gin Ser Ser Asn Ser Thr Ser 
1 5 

(2) INFORMATION FOR SEQ ID N0:13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 347 amino acids 
20 (B) TYPE: amino acid 

(C) STRAND EDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:13: 

25 Gin Lys Tyr Ser He Glu Gin Leu Thr Thr Tyr Trp Leu Gin Pro Gly 

1 5 10 15 

Asp Asp Phe Glu Glu Ala lie Arg Val Tyr Ala Lys Val Ala Leu Arg 
20 25 30 

Pro Asp Cys Lys Tyr Lys lie Ser Lys Leu Val Asn He Arg Asn Cys 
35 40 45 

Cys Tyr He Ser Gly Asn Gly Ala Glu Val Glu He Asp Thr Glu Asp 
3 0 50 55 60 

Arg Val Ala Phe Arg Cys Ser Met He Asn Met Trp Pro Gly Val Leu 
65 70 75 80 

Gly Met Asp Gly Val Val He Met Asn Val Arg Phe Thr Gly Pro Asn 
85 90 95 



Phe Ser Gly Thr Val Phe Leu Ala Asn Thr Asn Leu He Leu His Gly 
100 105 110 

Val Ser Phe Tyr Gly Phe Asn Asn Thr Cys Val Glu Ala Trp Thr Asp 
115 120 125 

Val Arg Val Arg Gly Cys Ala Phe Tyr Cys Cys Trp Lys Gly Val Val 
130 135 140 

Cys Arg Pro Lys Ser Arg Ala Ser lie Lys Lys Cys Leu Phe Glu Arg 
145 150 155 160 
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Cys Thr Leu Gly He Leu Ser Glu Gly Asn Ser Arg Val Arg His Asn 
165 170 175 

Val Ala Ser Asp Cys Gly Cys Phe Met Leu Val Lys Ser Val Ala Val 
180 185 190 

lie Lys His Asn Net Val Cys Gly Asn Cys Glu Asp Arg Ala Ser Gin 
195 200 205 

5 

Met Leu Thr Cys Ser Asp Gly Asn Cys His Leu Leu Lys Thr lie His 
210 215 220 

Val Ala Ser His Ser Arg Lys Ala Trp Pro Val Phe Glu His Asn He 
225 230 235 240 

Leu His Arg Cys Ser Leu His Leu Gly Asn Arg Arg Gly Val Phe Leu 
245 250 255 

10 Pro Tyr Gin Cys Asn Leu Ser His Thr Lys He Leu Leu Glu Pro Glu 

260 265 270 

Ser Met Ser Lys Val Asn Leu Asn Gly Val Phe Asp Met Thr Met Lys 
275 280 285 

He Trp Lys Val Leu Arg Tyr Asp Glu Thr Arg Thr Arg Cys Arg Pro 
290 295 300 

Cys Glu Cys Gly Gly Lys His He Arg Asn Gin Pro Val Met Leu Asp 
15 305 310 315 320 

Val Thr Glu Glu Leu Arg Pro Asp His Leu Val Leu Ala Cys His Arg 
325 330 335 

Ala Glu Phe Gly Ser Ser Asp Glu Asp Thr Asp 
340 345 

(2) INFORMATION FOR SEQ ID NO: 14: 

20 (O SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 140 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 
(0) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



25 (Xf) SEQUENCE DESCRIPTION: SEQ ID N0:14: 

Met Ser Thr Asn Ser Phe Asp Gly Ser He Val Ser Ser Tyr Leu Thr 
15 10 15 

Thr Arg Met Pro Pro Trp Ala Gly Val Arg Gin Asn Val Met Gly Sep 
20 25 30 



30 



Ser He Asp Gly Arg Pro Val Leu Pro Ala Asn Ser Thr Thr Leu Thr 
35 40 45 

Tyr Glu Thr Val Ser Gly Thr Pro Leu Glu Thr Ala Ala Ser Ala Ala 
50 55 60 

Ala Ser Ala Ala Ala Ala Thr Ala Arg Gly He Val Thr Asp Phe Ala 
65 70 75 80 

Phe Leu Ser Pro Leu Ala Ser Ser Ala Ala Ser Arg Ser Ser Ala Arg 
85 90 95 

35 Asp Asp Lys Leu Thr Ata Leu Leu Ala Gin Leu Asp Ser Leu Thr Arg 

100 ' 105 110 

Glu Leu Asn Val Val Ser Gin Gin Leu Leu Asp Leu Arg Gin Gin Val 
115 120 125 

Ser Ala Leu Lys Ala Ser Ser Pro Pro Asn Ala Val 
130 135 140 
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(2) INFORMATION FOR SEQ ID N0:15: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5100 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS: double 

(D) TOPOLOGY: linear 

5 (if) MOLECULE TYPE: DNA (genomic) 

(ix) FEATURE: 

(A) NAME/KEY: COS 

(B) LOCATION: 2, .418 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

10 C CTC ATC AAA CAA CCC GTG GTG GGC ACC ACC CAC GTG GAA ATG CCT 46 
Leu He Lys Gin Pro Val Val Gly Thr Thr His Val Glu Met Pro 
15 10 15 

CGC AAC GAA GTC CTA GAA CAA CAT CTG ACC TCA CAT GGC GCT CAA ATC 94 
Arg Asn Glu Val Leu Glu Gin His Leu Thr Ser His Gly Ala Gin lie 
20 25 30 

GCG GGC GGA GGC GCT GCG GGC GAT TAC TTT AAA AGC CCC ACT TCA GCT 142 
Ala Gly Gly Gly Ala Ala Gly Asp Tyr Phe Lys Ser Pro Thr Ser Ala 
15 35 40 45 

CGA ACC CTT ATC CCG CTC ACC GCC TCC TGC TTA AGA CCA GAT GGA GTC 190 
Arg Thr Leu He Pro Leu Thr Ala Ser Cys Leu Arg Pro Asp Gly Val 
50 55 60 

TTT CAA CTA GGA GGA GGC TCG CGT TCA TCT TTC AAC CCC CTG CAA ACA 238 
Phe Gin Leu Gly Gly Gly Ser Arg Ser Ser Phe Asn Pro Leu Gin Thr 
65 70 75 

20 GAT TTT GCC TTC CAC GCC CTG CCC TCC AGA CCG CGC CAC GGG GGC ATA 286 
Asp Phe Ala Phe His Ala Leu Pro Ser Arg Pro Arg His Gly Gly He 
80 85 90 95 

GGA TCC AGG CAG TTT GTA GAG GAA TTT GTG CCC GCC GTC TAC CTC AAC 334 
Gly Ser Arg Gin Phe Val Glu Glu Phe Val Pro Ala Val Tyr Leu Asn 
100 105 110 

CCC TAC TCG GGA CCG CCG GAC TCT TAT CCG GAC CAG TTT ATA CGC CAC 382 
Pro Tyr Ser Gly Pro Pro Asp Ser Tyr Pro Asp Gin Phe He Arg His 
25 115 120 125 

TAC AAC GTG TAC AGC AAC TCT GTG AGC GGT TAT AGC TGAGATTGTA 428 
Tyr Asn Val Tyr Ser Asn Ser Val Ser Gly Tyr Ser 
130 135 



30 



35 



AGACTCTCCT 


ATCTGTCTCT 


GTGCTGCTTT TCCGCTTCAA GCCCCACAAG CATGAAGGGG 


488 


TTTCTGCTCA 


TCTTCAGCCT 


GCTTGTGCAT TGTCCCCTAA TTCATGTTGG GACCATTAGC 


548 


TTCTATGCTG 


CAAGGCCCGG 


GTCTGAGCCT AACGCGACTT ATGTTTGTGA CTATGGAAGC 


608 


GAGTCAGATT 


ACAACCCCAC 


CA&SGTTCTG TGGTTGGCTC GAGAGACCGA TGGCTCCTGG 


668 


ATCTCTGTTC 


TTTTCCGTCA 


CAACGGCTCC TCAACTGCAG CCCCCGGGGT CGTCGCGCAC 


728 


TTTACTGACC 


ACAACAGCAG 


CATTGTGGTG CCCCAGTATT ACCTCCTCAA CAACTCACTC 


788 


TCTAAGCTCT 


GCTGCTCATA 


CCGGCACAAC GAGCGTTCTC AGTTTACCTG CAAACAAGCT 


848 


GACGTCCCTA 


CCTGTCACGA 


GCCCGGCAAG CCGCTCACCC TCCGCGTCTC CCCCGCGCTG 


908 


GGAACTGCCC 


ACCAAGCAGT 


CACTTGGTTT TTTCAAAATG TACCCATAGC TACTGTTTAC 


968 


CGACCTTGGG 


GCAATGTAAC 


TTGGTTTTGT CCTCCCTTCA TGTGTACCTT TAATGTCAGC 


1028 


CTGAACTCCC 


TACTTATTTA 


CAACTTTTCT GACAAAACCG GGGGGCAATA CACAGCTCTC 


1088 


ATGCACTCCG 


GACCTGCTTC 


CCTCTTTCAG CTCTTTAAGC CAACGACTTG TGTCACCAAG 


1148 



-74- 

GTGGAGGACC CGCCGTATGC CAACGACCCG GCCTC6CCTG TGTGGCGCCC ACTGCTTTTT 1208 
GCCTTCGTCC TCTGCACCGG CTGCGCGGTG TTGTTAACCG CCTTCGGTCC ATCGATTCTA 1268 
TCCGGTACCC GAAAGCTTAT CTCAGCCCGC TTTTGGAGTC CCGAGCCCTA TACCACCCTC 1328 
CACTAACAGT CCCCCCATGG AGCCAGACGG A6TTCATGCC GAGCAGCAGT TTATCCTCAA 1388 
TCAGATTTCC TGC6CCAACA CTGCCCTCCA GCGTCAAAGG GAGGAACTAG CTTCCCTTGT 1448 
CATGTTGCAT GCCTGTAAGC GTGGCCTCTT TTGTCCAGTC AAAACTTACA AGCTCAGCCT 1508 
CAACGCCTCG GCCAGCGAGC ACAGCCTGCA CTTTGAAAAA AGTCCCTCCC GATTCACCCT 1568 
GGTCAACACT CACGCCGGAG CTTCTGTGCG AGTGGCCCTA CACCACCAGG GAGCTTCCGG 1628 
CAGCATCCGC TGTTCCTGTT CCCACGCCGA GTGCCTCCCC GTCCTCCTCA AGACCCTCTG 1688 
TGCCTTTAAC TTTTTAGATT AgJtGAAAGC AAATATAAAA TGGTGTGCTT ACCGTAATTC 1748 
TGTTTTGACT TGTGTGCTTG ATTTCTCCCC CTGCGCCGTA ATCCAGTGCC CCTCTTCAAA 1808 
ACTCTCGTAC CCTATGCGAT TCGCATAGGC ATATTTTCTA AAAGCTCTGA AGTCAACATC 1868 
ACTCTCAAAC ACTTCTCCGT TGTAGGTTAC TTTCATCTAC AGATAAAGTC ATCCACCGGT 1928 
TAACATCATG AAGAGAAGTG TGCCCCAGGA CTTTAATCTT GTGTATCCGT ACAAGGCTAA 1988 
GAGGCCCAAC ATCATGCCGC CCTTTTTTGA CCGCAATGGC TTTGTTGAAA ACCAAGAAGC 2048 
CACGCTAGCC ATGCTTGTGG AAAAGCCGCT CACGTTCGAC AAGGAAGGTG CGCTGACCCT 2108 

GGGCGTCGGA CGCGGCATCC GCATTAACCC CGCGGGGCTT CTGGAGACAA ACGACCTCGC 2168 

GTCCGCTGTC TTCCCACCGC TG6CCTCCGA T6AGGCCGGC AACGTCACGC TCAACATGTC 2228 

TGACGGGCTA TATACTAAGG ACAACAAGCT AGCTGTCAAA GTAGGTCCCG CGCTGTCCCT 2288 

CGACTCCAAT AATGCTCTCC AGGTCCACAC AGGCGACGGG CTCACGGTAA CCGATGACAA 2348 

GGTGTCTCTA AATACCCAAG CTCCCCTCTC GACCACCAGC GCGGGCCTCT CCCTACTTCT 2408 

GGGTCCCAGC CTCCACTTAG GTGAGGAGGA ACGACTAACA GTAAACACCG GAGCGGGCCT 2468 

CCAAATTAGC AATAACGCTC TGGCCGTAAA AGTAGGTTCA GGTATCACCG TAGATGCTCA 2528 

AAACCAGCTC GCTGCATCCC TGGGGGACGG TCTAGAAAGC AGAGATAATA AAACTGTCGT 2588 

TAAGGCTGGG CCCGGACTTA CAATAACTAA TCAAGCTCTT ACTGTTGCTA CCGGGAACGG 2648 

CCTTCAGGTC AACCCG6AAG GGCAACTGCA GCTAAACATT ACTGCCGGTC AGGGCCTCAA 2708 

CTTTGCAAAC AACAGCCTCG CCGTGGAGCT GGGCTCGG6C CTCCATTTTC CCCCTGGCCA 2768 

AAACCAAGTA AGCCTTTATC CCGGAGATGG AATAGACATC CGAGATAATA GGGTGACTGT 2828 

GCCCGCTGGG CCAGGCCTGA GAATGCTCAA CCACCAACTT GCCGTAGCTT CCGGAGAC6G 2888 

TTTAGAAGTC CACAGCGACA CCCTCCGGTT AAAGCTCTCC CACGGCCTGA CATTTGAAAA 2948 

TGGCGCCGTA CGAGCAAAAC TAGGACCAGG ACTTGGCACA GACGACTCTG GTCGGTCCGT 3008 

GGTTCGCACA GGTCGAGGAC TTAGAGTTGC AAACGGCCAA GTCCAGATCT TCAGCGGAAG 3068 

AGGCACCGCC ATCGGCACTG ATAGCAGCCT CACTCTCAAC ATCCGGGCGC CCCTACAATT 3128 

TTCTGGACCC GCCTTGACTG CTAGTTTGCA AGGCAGTGGT CCGATTACTT ACAACAGCAA 3188 

CAATGGCACT TTCGGTCTCT CTATAGGCCC CGGAATGTGG GTAGACCAAA ACAGACTTCA 3248 

GGTAAACCCA GGCGCTGGTT TAGTCTTCCA AGGAAACAAC CTTGTCCCAA ACCTTGCGGA 3308 

TCCGCTGGCT ATTTCCGACA GCAAAATTAG TCTCAGTCTC GGTCCCGGCC TGACCCAAGC 3368 

TTCCAACGCC CTGACTTTAA GTTTAGGAAA CGGGCTTGAA TTCTCCAATC AAGCCGTTGC 3428 

TATAAAAGCG GGCCGGGGCT TACGCTTTGA GTCTTCCTCA CAAGCTTTAG AGAGCAGCCT 3488 
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CACAGTCGGA AATGGCTTAA CGCTTACCGA TACTGTGATC CGCCCCAACC TAGGGGACGG 3548 

CCTAGAGGTC AGAGACAATA AAATCATTGT TAAGCTGGGC GCGAATCTTC GTTTTGAAAA 3608 

CGGAGCCGTA ACCGCCGGCA CCGTTAACCC TTCTGCGCCC GAGGCACCAC CAACTCTCAC 3668 

TGCAGAACCA CCCCTCCGAG CCTCCAACTC CCATCTTCAA CTGTCCCTAT CGGAGGGCTT 3728 

5 GGTTGTGCAT AACAACGCCC TTGCTCTCCA ACTGGGAGAC GGCATGGAAG TAAATCAGCA 3788 

CGGACTTACT TTAAGAGTAG GCTCGGGTTT GCAAATGCGT GACGGCATTT TAACAGTTAC 3848 

ACCCAGCGGC ACTCCTATTG AGCCCAGACT GACTGCCCCA CTGACTCAGA CAGAGAATGG 3908 

AATCGGGCTC GCTCTCGGCG CCGGCTTGGA ATTAGACGAG AGCGCGCTCC AAGTAAAAGT 3968 

TGGGCCCGGC ATGCGCCTGA ACCCTGTAGA AAAGTATGTA ACCCTGCTCC TGGGTCCTGG 4028 

10 CCTTAGTTTT GGGCAGCCGG CCAACAGGAC AAATTATGAT GTGCGCGTTT CTGTGGAGCC 4088 

CCCCATGGTT TTCGGACAGC GTGGTCAGCT CACATTTTTA GTGGGTCACG GACTACACAT 4148 

TCAAAATTCC AAACTTCAGC TCAATTTGGG ACAAGGCCTC AGAACTGACC CCGTCACCAA 4208 

CCAGCTGGAA GTGCCCCTCG GTCAAGGTTT GGAAATTGCA GACGAATCCC AGGTTAGGGT 4268 

TAAATTGGGC GATGGCCTGC AGTTTGATTC ACAA6CTCGC ATCACTACCG CTCCTAACAT 4328 

15 GGTCACTGAA ACTCTGTGGA CCGGAACAGG CAGTAATGCT AATGTTACAT GGCGGGGCTA 4388 

CACTGCCCCC GGCAGCAAAC TCTTTTTGAG TCTCACTCGG TTCAGCACTG GTCTAGTTTT 4448 

AGGAAACATG ACTATTGACA GCAATGCATC CTTTGGGCAA TACATTAACG CGGGACACGA 4508 

ACAGATCGAA TGCTTTATAT TGTTGGACAA TCAGGGTAAC CTAAAAGAAG GATCTAACTT 4568 

GCAAGGCACT TGGGAAGTGA AGAACAACCC CTCTGCTTCC AAAGCTGCTT TTTTGCCTTC 4628 

20 CACCGCCCTA TACCCCATCC TCAACGAAAG CCGAGGGAGT CTTCCTGGAA AAAATCTTGT 4688 

GGGCATGCAA GCCATACTGG GAGGCGGGGG CACTTGCACT GTGATAGCCA CCCTCAATGG 4748 

CAGACGCAGC AACAACTATC CCGCGGGCCA GTCCATAATT TTCGTGT6GC AAGAATTCAA 4808 

CACCATAGCC CGCCAACCTC TGAACCACTC TACACTTACT TTTTCTTACT GGACTTAAAT 4868 

AAGTTGGAAA TAAAGAGTTA AACTGAATGT TTAAGTGCAA CAGACTTTTA TTGGTTTTGG 4928 

25 CTCACAACAA ATTACAACAG CATAGACAAG TCATACCGGT CAAACAACAC AGGCTCTCGA 4988 

AAACGGGCTA ACCGCTCCAA GAATCTGTCA CGCAGACGAG CAAGTCCTAA ATGTTTTTTC 5048 

ACTCTCTTCG G6GCCAAGTT CAGCATGTAT CGGATTTTCT GCTTACACCT TT 5100 

(2) INFORMATION FOR SEQ ID N0:16: 

(i) SEQUENCE CHARACTERISTICS: 
30 (A) LENGTH: 139 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:16: 



35 



Leu He Lys Gin Pro Val Val Gly Thr Thr His Val Glu Met Pro Arg 
15 10 15 

Asn Glu Val Leu Glu Gin His Leu Thr Ser His Gly Ala Gin lie Ala 
20 25 30 

Gly Gly Gly Ala Ala Gly Asp Tyr Phe Lys Ser Pro Thr Ser Ala Arg 
35 40 45 

Thr Leu lie Pro Leu Thr Ala Ser Cys Leu Arg Pro Asp Gly Val Phe 
50 55 60 
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Gln Leu Gly Gly Gly Ser Arg Ser Ser Phe Asn Pro Leu Gin Thr Asp 
65 70 75 80 

Phe Ala Phe His Ala Leu Pro Ser Arg Pro Arg His Gly Gly He Gly 
85 90 95 

Ser Arg Gin Phe Val Glu Glu Phe Val Pro Ala Val Tyr Leu Asn Pro 
100 105 110 

5 

Tyr Ser Gly Pro Pro Asp Ser Tyr Pro Asp Gin Phe He Arg His Tyr 
115 120 125 

Asn Val Tyr Ser Asn Ser Val Ser Gly Tyr Ser 
130 135 

(2) INFORMATION FOR SEQ ID N0:17: 

(i) SEQUENCE CHARACTERISTICS: 
10 (A) LENGTH: 5100 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(if) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME/KEY: COS 
15 <B> LOCATION: 408.. 1331 



Cxi) SEQUENCE DESCRIPTION: SEQ ID N0:17: 

CCTCATCAAA CAACCCGTGG TGGGCACCAC CCACGTGGAA ATGCCTCGCA ACGAAGTCCT 60 

AGAACAACAT CTGACCTCAC ATGGCGCTCA AATCGCGGGC GGAGGCGCTG CGGGCGATTA 120 

CTTTAAAAGC CCCACTTCAG CTCGAACCCT TATCCCGCTC ACCGCCTCCT GCTTAAGACC 180 

AGATGGAGTC TTTCAACTAG GAGGAGGCTC GCGTTCATCT TTCAACCCCC TGCAAACAGA 240 

TTTTGCCTTC CACGCCCTGC CCTCCA6ACC GCGCCACGGG GGCATAGGAT CCAGGCAGTT 300 

TGTAGAGGAA TTTGTGCCCG CCGTCTACCT CAACCCCTAC TCGGGACCGC CGGACTCTTA 360 



20 



25 



TCCGGACCAG TTTATACGCC ACTACAACGT GTACAGCAAC TCTGTGA GCG GTT ATA 416 

Ala Val lie 
1 

GCT GAG ATT GTA AGA CTC TCC TAT CTG TCT CTG TGC TGC TTT TCC GCT 464 
Ala Glu He Val Arg Leu Ser Tyr Leu Ser Leu Cys Cys Phe Ser Ala 
5 10 15 

TCA AGC CCC ACA AGC ATG AAG GGG TTT CTG CTC ATC TTC AGC CTG CTT 512 
Ser Ser Pro Thr Ser Met Lys Gly Phe Leu Leu He Phe Ser Leu Leu 
20 25 30 35 

GTG CAT TGT CCC CTA ATT CAT GTT GGG ACC ATT AGC TTC TAT GCT GCA 560 
30 Val His Cys Pro Leu He His Val Gly Thr He Ser Phe Tyr Ala Ala 
40 45 50 

AGG CCC GGG TCT GAG CCT AAC GCG ACT TAT GTT TGT GAC TAT GGA AGC 608 
Arg Pro Gly Ser Glu Pro Asn Ala Thr Tyr Val Cys Asp Tyr Gly Ser 
55 60 65 

GAG TCA GAT TAC AAC CCC ACC ACG GTT CTG TGG TTG GCT CGA GAG ACC 656 
Glu Ser Asp Tyr Asn Pro Thr Thr Val Leu Trp Leu Ala Arg Glu Thr 
70 75 80 



35 



GAT GGC TCC TGG ATC TCT GTT CTT TTC CGT CAC AAC GGC TCC TCA ACT 704 
Asp Gly Ser Trp He Ser Val Leu Phe Arg His Asn Gly Ser Ser Thr 
85 90 95 

GCA GCC CCC GGG GTC GTC GCG CAC TTT ACT GAC CAC AAC AGC AGC ATT 752 
Ala Ala Pro Gly Val Val Ala His Phe Thr Asp His Asn Ser Ser He 
100 105 110 115 
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GTG GTG CCC CAG TAT TAC CTC CTC AAC AAC TCA CTC TCT AAG CTC TGC 800 
Val Val Pro Gin Tyr Tyr Leu Leu Asn Asn Sep Leu Ser Lys Leu Cys 
120 125 130 

TGC TCA TAC CGG CAC AAC GAG CGT TCT CAG TTT ACC TGC AAA CAA GCT 848 
Cys Ser Tyr Arg His Asn Glu Arg Ser Gin Phe Thr Cys Lys Gin Ala 
135 140 145 

5 GAC GTC CCT ACC TGT CAC GAG CCC GGC AAG CCG CTC ACC CTC CGC GTC 896 
Asp Val Pro Thr Cys His Glu Pro Gly Lys Pro Leu Thr Leu Arg Val 
150 155 160 

TCC CCC GCG CTG GGA ACT GCC CAC CAA GCA GTC ACT TGG TTT TTT CAA 944 
Ser Pro Ala Leu Gly Thr Ala His Gin Ala Val Thr Trp Phe Phe Gin 
165 170 175 

AAT GTA CCC ATA GCT ACT GTT TAC CGA CCT TGG GGC AAT GTA ACT TGG 992 
Asn Val Pro lie Ala Thr Val Tyr Arg Pro Trp Gly Asn Val Thr Trp 
10 180 185 190 195 

TTT TGT CCT CCC TTC ATG TGT ACC TTT AAT GTC AGC CTG AAC TCC CTA 1040 
Phe Cys Pro Pro Phe Met Cys Thr Phe Asn Val Ser Leu Asn Ser Leu 
200 205 210 

CTT ATT TAC AAC TTT TCT GAC AAA ACC GGG GGG CAA TAC ACA GCT CTC 1088 
Leu lie Tyr Asn Phe Ser Asp Lys Thr Gly Gly Gin Tyr Thr Ala Leu 
215 220 225 

15 ATG CAC TCC GGA CCT GCT TCC CTC TTT CAG CTC TTT AAG CCA ACG ACT 1136 
Met His Ser Gly Pro Ala Ser Leu Phe Gin Leu Phe Lys Pro Thr Thr 
230 235 240 

TGT GTC ACC AAG GTG GAG GAC CCG CCG TAT GCC AAC GAC CCG GCC TCG 1184 
Cys Val Thr Lys Val Glu Asp Pro Pro Tyr Ala Asn Asp Pro Ala Ser 
245 250 255 

CCT GTG TGG CCC CCA CTG CTT TTT GCC TTC GTC CTC TGC ACC GGC TGC 1232 
Pro Val Trp Arg Pro Leu Leu Phe Ala Phe Val Leu Cys Thr Gly Cys 
20 260 265 270 275 

GCG GTG TTG TTA ACC GCC TTC GGT CCA TCG ATT CTA TCC GGT ACC CGA 1280 
Ala Val Leu Leu Thr Ala Phe Gly Pro Ser He Leu Ser Gly Thr Arg 
280 285 290 

AAG CTT ATC TCA GCC CGC TTT TGG AGT CCC GAG CCC TAT ACC ACC CTC 1328 
Lys Leu lie Ser Ala Arg Phe Trp Ser Pro Glu Pro Tyr Thr Thr Leu 
295 300 305 

25 CAC TAACAGTCCC CCCATGGAGC CAGACGGAGT TCATGCCGAG CAGCAGTTTA 1381 
His 

TCCTCAATCA GATTTCCTGC GCCAACACTG CCCTCCAGCG TCAAAGGGAG GAACTAGCTT 1441 

CCCTTGTCAT GTTGCATGCC TGTAAGCGTG GCCTCTTTTG TCCAGTCAAA ACTTACAAGC 1501 

TCAGCCTCAA CGCCTCGGCC AGCGAGCACA GCCTGCACTT TGAAAAAAGT CCCTCCCGAT 1561 

30 TCACCCTGGT CAACACTCAC GCCGGAGCTT CTGTGCGAGT GGCCCTACAC CACCAGGGAG 1621 

CTTCCGGCAG CATCCGCTGT TCCTGTTCCC ACGCCGAGTG CCTCCCCGTC CTCCTCAAGA 1681 

CCCTCTGT6C CTTTAACTTT TTAGATTAGC TGAAAGCAAA TATAAAATGG TGTGCTTACC 1741 

GTAATTCTGT TTTGACTTGT GTGCTTGATT TCTCCCCCTG CGCCGTAATC CAGTGCCCCT 1801 

CTTCAAAACT CTCGTACCCT ATGCGATTCG CATAGGCATA TTTTCTAAAA GCTCTGAAGT 1861 

35 CAACATCACT CTCAAACACT TCTCCGTTGT AGGTTACTTT CATCTACAGA T AAA GT CATC 1921 

CACCGGTTAA CATCATGAAG AGAAGTGTGC CCCAGGACTT TAATCTTGTG TATCCGTACA 1981 

AGGCTAAGAG GCCCAACATC ATGCCGCCCT TTTTTGACCG CAATGGCTTT GTTGAAAACC 2041 

AAGAAGCCAC GCTAGCCATG CTTGTGGAAA AGCCGCTCAC GTTCGACAAG GAAGGTGCGC 2101 

TGACCCTGGG CGTCGGACGC GGCATCCGCA TTAACCCCGC GGGGCTTCTG GAGACAAACG 2161 
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ACCTCGCGTC CGCTGTCTTC CCACCGCTGG CCTCCGATGA G6CCGGCAAC GTCACGCTCA 2221 

ACATGTCTGA CGGGCTATAT ACTAAGGACA ACAAGCTAGC TGTCAAAGTA GGTCCCGGGC 2281 

TGTCCCTCGA CTCCAATAAT GCTCTCCAGG TCCACACAGG CGACGGGCTC ACGGTAACCG 2341 

ATGACAAGGT GTCTCTAAAT ACCCAAGCTC CCCTCTCGAC CACCAGCGCG GGCCTCTCCC 2401 

5 TACTTCTGGG TCCCAGCCTC CACTTAGGTG AGGAGGAACG ACTAACAGTA AACACCGGAG 2461 

CGGGCCTCCA AATTAGCAAT AACGCTCTGG CCGTAAAAGT AGGTTCAGGT ATCACCGTAG 2521 

ATGCTCAAAA CCAGCTCGCT GCATCCCTGG GGGACGGTCT AGAAAGCAGA GATAATAAAA 2581 

CTGTCGTTAA GGCTGGGCCC GGACTTACAA TAACTAATCA AGCTCTTACT GTTGCTACCG 2641 

GGAACGGCCT TCAGGTCAAC CCGGAAGGGC AACTGCAGCT AAACATTACT GCCGGTCAGG 2701 

10 GCCTCAACTT TGCAAACAAC AGCCTCGCCG TGGAGCTGGG CTCGGGCCTG CATTTTCCCC 2761 

CTGGCCAAAA CCAAGTAAGC CTTTATCCCG GAGATGGAAT AGACATCCGA GATAATAGGG 2821 

TGACTGTGCC CGCTGGGCCA GGCCTGAGAA TGCTCAACCA CCAACTTGCC GTAGCTTCCG 2881 

GAGACGGTTT AGAAGTCCAC AGCGACACCC TCCGGTTAAA GCTCTCCCAC GGCCTGACAT 2941 

TTGAAAATGG CGCCGTACGA GCAAAACTAG GACCAGGACT TGGCACAGAC GACTCTGGTC 3001 

15 GGTCCGTGGT TCGCACAGGT CGAGGACTTA GAGTTGCAAA CGGCCAAGTC CAGATCTTCA 3061 

GCGGAAGAGG CACCGCCATC GGCACTGATA GCAGCCTCAC TCTCAACATC CGGGC6CCCC 3121 

TACAATTTTC TGGACCCGCC TTGACTGCTA GTTTGCAAGG CAGTGGTCCG ATTACTTACA 3181 

ACAGCAACAA TGGCACTTTC GGTCTCTCTA TAGGCCCCGG AATGTGGGTA GACCAAAACA 3241 

GACTTCAGGT AAACCCAG6C GCTGGTTTAG TCTTCCAAGG AAACAACCTT GTCCCAAACC 3301 

20 TTGCGGATCC GCTGGCTATT TCCGACAGCA AAATTAGTCT CAGTCTCGGT CCCGGCCTGA 3361 

CCCAAGCTTC CAACGCCCTG ACTTTAAGTT TA6GAAACGG GCTTGAATTC TCCAATCAAG 3421 

CCGTTGCTAT AAAAGCGGGC CGGGGCTTAC GCTTTGAGTC TTCCTCACAA GCTTTAGAGA 3481 

GCAGCCTCAC AGTCGGAAAT GGCTTAACGC TTACCGATAC TGTGATCCGC CCCAACCTAG 3541 

GGGACGGCCT AGAGGTCAGA GACAATAAAA TCATTGTTAA GCTGGGCGCG AATCTTCGTT 3601 

25 TTGAAAACGG AGCCGTAACC GCCGGCACCG TTAACCCTTC TGCGCCCGAG GCACCACCAA 3661 

CTCTCACTGC AGAACCACCC CTCCGAGCCT CCAACTCCCA TCTTCAACTG TCCCTATCGG 3721 

AGGGCTTGGT TGTGCATAAC AACGCCCTTG CTCTCCAACT GGGAGACGGC ATGGAAGTAA 3781 

ATCAGCACGG ACTTACTTTA AGAGTAGGCT CGGGTTTGCA AATGCGTGAC GGCATTTTAA 3841 

CAGTTACACC CAGCGGCACT CCTATTGAGC CCAGACTGAC TGCCCCACTG ACTCAGACAG 3901 

30 AGAATGGAAT CGGGCTCGCT CTCGGCGCCG GCTTGGAATT AGACGAGAGC GCGCTCCAAG 3961 

TAAAAGTTGG GCCCGGCATG CGCCTGAACC CTGTAGAAAA GTATGTAACC CTGCTCCTGG 4021 

GTCCTGGCCT TAGTTTTGGG CAGCCGGCCA ACAGGACAAA TTATGATGTG CGCGTTTCTG 4081 

TGGAGCCCCC CATGGTTTTC 6GACAGCGTG GTCAGCTCAC ATTTTTAGTG GGTCACGGAC 4141 

TACACATTCA AAATTCCAAA CTTCAGCTCA ATTTGGGACA AGGCCTCAGA ACTGACCCCG 4201 

35 TCACCAACCA GCTGGAAGTG CCCaCGGTC AAGGTTTGGA AATTGCAGAC GAATCCCAGG 4261 

TTAGGGTTAA ATTGGGCGAT GGCCTGCAGT TTGATTCACA AGCTCGCATC ACTACCGCTC 4321 

CTAACATGGT CACTGAAACT CTGTGGACCG GAACAGGCAG TAATGCTAAT GTTACATGGC 4381 

GGGGCTACAC TGCCCCCGGC AGCAAACTCT TTTT6AGTCT CACTCGGTTC AGCACTGGTC 4441 

TAGTTTTAGG AAACATGACT ATTGACAGCA ATGCATCCTT TGGGCAATAC ATTAACGCGG 4501 
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GACACGAACA GATCGAATGC TTTATATTGT TGGACAATCA GGGTAACCTA AAAGAAGGAT 4561 

CTAACTTGCA AGGCACTTGG GAAGTGAAGA ACAACCCCTC TGCTTCCAAA GCTGCTTTTT 4621 

TGCCTTCCAC CGCCCTATAC CCCATCCTCA ACGAAAGCCG AGGGAGTCTT CCTGGAAAAA 4681 

ATCTTGTGGG CATGCAAGCC ATACTGGGAG GCGGGG6CAC TTGCACTGTG ATAGCCACCC 4741 

5 TCAATGGCAG ACGCAGCAAC AACTATCCCG CGGGCCAGTC CATAATTTTC GTGTGGCAAG 4801 

AATTCAACAC CATAGCCCGC CAACCTCTGA ACCACTCTAC ACTTACTTTT TCTTACTGGA 4861 

CTTAAATAAG TTGGAAATAA AGAGTTAAAC TGAATGTTTA AGTGCAACAG ACTTTTATTG 4921 

GTTTTGGCTC ACAACAAATT ACAACAGCAT AGACAAGTCA TACCGGTCAA ACAACACAGG 4981 

CTCTCGAAAA CGGGCTAACC GCTCCAAGAA TCTGTCACGC AGACGAGCAA GTCCTAAATG 5041 

10 TTTTTTCACT CTCTTCGGGG CCAAGTTCAG CATGTATCGG ATTTTCTGCT TACACCTTT 5100 

(2) INFORMATION FOR SEQ ID NO: 18: 

(I) SEOUENCE CHARACTERISTICS: 

(A) LENGTH: 308 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: Linear 

15 (ii) KOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:18: 

Ala Val lie Ala Glu lie Val Arg Leu Ser Tyr Leu Ser Leu Cys Cys 
15 10 15 

Phe Ser Ala Ser Ser Pro Thr Ser Met Lys Gly Phe Leu Leu He Phe 
20 25 30 

20 Ser Leu Leu Val His Cys Pro Leu He His Val Gly Thr lie Ser Phe 
35 40 45 

Tyr Ala Ala Arg Pro Gly Ser Glu Pro Asn Ala Thr Tyr Val Cys Asp 
50 55 60 

Tyr Gly Ser Glu Ser Asp Tyr Asn Pro Thr Thr Val Leu Trp Leu Ala 
65 70 75 80 

Arg Glu Thr Asp Gly Ser Trp He Ser Val Leu Phe Arg His Asn Gly 
25 85 90 95 

Ser Ser Thr Ala Ala Pro Gly Val Val Ala His Phe Thr Asp His Asn 
100 105 110 

Ser Ser He Val Val Pro Gin Tyr Tyr Leu Leu Asn Asn Ser Leu Ser 
115 120 125 



30 



Lys Leu Cys Cys Ser Tyr Arg His Asn Glu Arg Ser Gin Phe Thr Cys 
130 135 140 

Lys Gin Ala Asp Val Pro Thr Cys His Glu Pro Gly Lys Pro Leu Thr 
145 150 155 160 

Leu Arg Val Ser Pro Ala Leu Gly Thr Ala His Gin Ala Val Thr Trp 
165 170 175 

Phe Phe Gin Asn Val Pro He Ala Thr Val Tyr Arg Pro Trp Gly Asn 
180 185 190 

35 Val Thr Trp Phe Cys Pro Pro Phe Met Cys Thr Phe Asn Val Ser Leu 
195 200 205 

Asn Ser Leu Leu He Tyr Asn Phe Ser Asp Lys Thr Gly Gly Gin Tyr 
210 215 220 

Thr Ala Leu Met His Ser Gly Pro Ala Ser Leu Phe Gin Leu Phe Lys 
225 230 235 240 



-80- 



Pro Thr Thr Cys Val Thr Lys Val Glu Asp Pro Pro Tyr Ala Asn Asp 
245 250 255 

Pro Ala Ser Pro Val Trp Arg Pro Leu Leu Phe Ala Phe Val Leu Cys 
260 265 270 

Thr Gly Cys Ala Val Leu Leu Thr Ala Phe Gly Pro Ser He Leu Ser 
275 280 285 

Gly Thr Arg Lys Leu lie Ser Ala Arg Phe Trp Ser Pro Glu Pro Tyr 
290 295 300 

Thr Thr Leu His 
305 

(2) INFORMATION FOR SEQ ID N0:19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5100 base pairs 
(8) TYPE: nucleic acid 

(C) STRAND EDNESS: double 

(D) TOPOLOGY: linear 

O'i) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 529. .954 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:19: 

CCTCATCAAA CAACCCGTGG TGGGCACCAC CCACGTGGAA ATGCCTCGCA ACGAAGTCCT 

AGAACAACAT CTGACCTCAC ATGGCGCTCA AATCGCGGGC GGAGGCGCTG CGGGCGATTA 

CTTTAAAAGC CCCACTTCAG CTCGAACCCT TATCCCGCTC ACCGCCTCCT GCTTAAGACC 

AGATGGAGTC TTTCAACTAG GAGGAGGCTC GCGTTCATCT TTCAACCCCC TGCAAACAGA 

TTTTGCCTTC CACGCCCTGC CCTCCAGACC GCGCCACGGG GGCATAGGAT CCAGGCAGTT 

TGTAGAGGAA TTTGTGCCCG CCGTCTACCT CAACCCCTAC TCGGGACCGC CGGACTCTTA 

TCCGGACCAG TTTATACGCC ACTACAACGT GTACAGCAAC TCTGTGAGCG GTTATAGCTG 

AGATTGTAAG ACTCTCCTAT CTGTCTCTGT GCTGCTTTTC CGCTTCAAGC CCCACAAGCA 

TGAAGGGGTT TCTGCTCATC TTCAGCCTGC TTGTGCATTG TCCCCTAA TTC ATG TTG 

Phe Met Leu 

1 

GGA CCA TTA GCT TCT ATG CTG CAA GGC CCG GGT CTG AGC CTA ACG CGA 
Gly Pro Leu Ala Ser Met Leu Gin Gly Pro Gly Leu Ser Leu Thr Arg 
5 10 15 

CTT ATG TTT GTG ACT ATG GAA GCG ACT CAG ATT ACA ACC CCA CCA CGG 
Leu Met Phe Val Thr Met Glu Ala Ser Gin lie Thr Thr Pro Pro Arg 
20 25 30 35 

TTC TGT GGT TGG CTC GAG AGA CCG ATG GCT CCT GGA TCT CTG TTC TTT 
Phe Cys Gly Trp Leu Glu Arg Pro Met Ala Pro Gly Ser Leu Phe Phe 
40 45 50 

TCC GTC ACA ACG GCT CCT CAA CTG CAG CCC CCG GGG TCG TCG CGC ACT . 
' Ser Val Thr Thr Ala Pro Gin Leu Gin Pro Pro Gly Ser Ser Arg Thr 
55 60 65 

TTA CTG ACC ACA ACA GCA GCA TTG TGG TGC CCC ACT ATT ACC TCC TCA 
Leu Leu Thr Thr Thr Ala Ala Leu Trp Cys Pro Ser He Thr Ser Ser 
70 75 80 



ACA ACT CAC TCT CTA AGC TCT GCT GCT CAT ACC GGC ACA ACG AGC GTT 
Thr Thr His Ser Leu Ser Ser ALa Ala His Thr Gly Thr Thr Ser Val 
85 90 95 
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CTC AGT TTA CCT GCA AAC AAG CTG ACG TCC CTA CCT GTC ACG AGC CCG 873 
Leu Ser Leu Pro Ala Asn Lys Leu Thr Ser Leu Pro Val Thr Ser Pro 
100 105 110 115 

GCA AGC CGC TCA CCC TCC GCG TCT CCC CCG CGC TGG GAA CTG CCC ACC 921 
Ala Ser Arg Ser Pro Ser Ala Ser Pro Pro Arg Trp Glu Leu Pro Thr 
120 125 130 

5 AAG CAG TCA CTT GGT TTT TTC AAA ATG TAC CCA TAGCTACTGT TTACCGACCT 974 
Lys Gin Ser Leu Gly Phe Phe Lys Met Tyr Pro 
135 140 

TGGGGCAATG TAACTTGGTT TTGTCCTCCC TTCATGTGTA CCTTTAATGT CAGCCTGAAC 1034 

TCCCTACTTA TTTACAACTT TTCTGACAAA ACCGGGGGGC AATACACAGC TCTCATGCAC 1094 

TCCGGACCTG CTTCCCTCTT TCAGCTCTTT AAGCCAACGA CTTGTGTCAC CAAGGTGGAG 1154 

10 GACCCGCCGT ATGCCAACGA CCCGGCCTCG CCTGTGTGGC GCCCACTGCT TTTTGCCTTC 1214 

GTCCTCTGCA CCGGCTGCGC GGTGTTGTTA ACCGCCTTCG GTCCATCGAT TCTATCCGGT 1274 

ACCCGAAAGC TTATCTCAGC CCGCTTTTGG AGTCCCGAGC CCTATACCAC CCTCCACTAA 1334 

CAGTCCCCCC ATGGAGCCAG ACGGAGTTCA TGCCGAGCAG CAGTTTATCC TCAATCAGAT 1394 

TTCCTGCGCC AACACTGCCC TCCAGCGTCA AAGGGAGGAA CTAGCTTCCC TTGTCATGTT 1454 

15 GCATGCCTGT AAGCGT6GCC TCTTTTGTCC AGTCAAAACT TACAAGCTCA GCCTCAACGC 1514 

CTCGGCCAGC GAGCACAGCC TGCACTTTGA AAAAAGTCCC TCCCGATTCA CCCTGGTCAA 1574 

CACTCACGCC GGAGCTTCTG TGCGAGTGGC CCTACACCAC CAGGGAGCTT CCCGCAGCAT 1634 

CCGCTGTTCC TGTTCCCACG CCGAGTGCCT CCCCGTCCTC CTCAAGACCC TCTGTGCCTT 1694 

TAACTTTTTA GATTAGCTGA AAGCAAATAT AAAATGGTGT GCTTACCGTA ATTCTGTTTT 1754 

20 GACTTGTGTG' CTTGATTTCT CCCCCTGCGC CGTAATCCAG TGCCCCTCTT CAAAACTCTC 1814 

GTACCCTATG CGATTCGCAT AGGCATATTT TCTAAAAGCT CTGAAGTCAA CATCACTCTC 1874 

AAACACTTCT CCGTTGTAGG TTACTTTCAT CTACAGATAA AGTCATCCAC CGGTTAACAT 1934 

CAT GAA GAGA AGTGTGCCCC AGGACTTTAA TCTTGTGTAT CCGTACAAGG CTAAGAGGCC 1994 

CAACATCATG CCGCCCTTTT TTGACCGCAA TGGCTTTGTT GAAAACCAAG AAGCCACGCT 2054 

25 AGCCATGCTT GTGGAAAAGC CGCTCACGTT CGACAAGGAA GGTGCGCTGA CCCTGGGCGT 2114 

CGGACGCGGC ATCCGCATTA ACCCCGCGGG GCTTCTGGAG ACAAACGACC TCGCGTCCGC 2174 

TGTCTTCCCA CCGCTGGCCT CCGATGAGGC CGGCAACGTC ACGCTCAACA TGTCTGACGG 2234 

GCTATATACT AAGGACAACA AGCTAGCTGT CAAAGTAGGT CCCGGGCTGT CCCTCGACTC 2294 

CAATAATGCT CTCCAGGTCC ACACAGGCGA CGGGCTCACG GTAACCGATG ACAAGGTGTC 2354 

30 TCTAAATACC CAAGCTCCCC TCTCGACCAC CAGCGCGGGC CTCTCCCTAC TTCTGGGTCC 2414 

CAGCCTCCAC TTAGGTGAGG AGGAACGACT AACAGTAAAC ACCGGAGCGG GCCTCCAAAT 2474 

TAGCAATAAC GCTCTGGCCG TAAAAGTAGG TTCAGGTATC ACCGTAGATG CTCAAAACCA 2534 

GCTCGCTGCA TCCCTGGGGG ACGGTCTAGA AAGCAGAGAT AATAAAACTG TCGTTAAGGC 2594 

TGGGCCCGGA CTTACAATAA CTAATCAAGC TCTTACTGTT GCTACCGGGA ACGGCCTTCA 2654 

35 GGTCAACCCG GAAGGGCAAC TGCAGCTAAA CATTACTGCC GGTCAGGGCC TCAACTTTGC 2714 

AAACAACAGC CTCGCCGTGG AGCTGGGCTC GGGCCTGCAT TTTCCCCCTG GCCAAAACCA 2774 

AGTAAGCCTT TATCCCGGAG ATGGAATAGA CATCCGAGAT AATAGGGTGA CTGTGCCCGC 2834 

TGGGCCAGGC CTGAGAATGC TCAACCACCA ACTTGCCGTA GCTTCCGGAG ACGGTTTAGA 2894 

AGTCCACAGC GACACCCTCC GGTTAAAGCT CTCCCACGGC CTGACATTTG AAAATGGCGC 2954 
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CGTACGAGCA AAACTAGGAC CAGGACTTGG CACAGACGAC TCTGGTCGGT CCGTGGTTCG 3014 
CACAGGTCGA GGACTTAGAG TTGCAAACGG CCAAGTCCAG ATCTTCAGCG GAAGAGGCAC 3074 
CGCCATCGGC ACTGATAGCA GCCTCACTCT CAACATCCGG GCGCCCCTAC AATTTTCTGG 3134 
ACCCGCCTTG ACTGCTAGTT TGCAAGGCAG TGGTCCGATT ACTTACAACA GCAACAATGG 3194 
5 CACTTTCGGT CTCTCTATAG GCCCCGGAAT GTGGGTAGAC CAAAACAGAC TTCAGGTAAA 3254 

CCCAGGCGCT GGTTTAGTCT TCCAAGGAAA CAACCTTGTC CCAAACCTTG CGGATCCGCT 3314 

GGCTATTTCC GACAGCAAAA TTAGTCTCAG TCTCGGTCCC GGCCTGACCC AAGCTTCCAA 3374 

CGCCCTGACT TTAAGTTTAG GAAACGGGCT TGAATTCTCC AATCAAGCCG TTGCTATAAA 3434 

AGCGGGCCGG GGCTTACGCT TTGAGTCTTC CTCACAAGCT TTAGAGAGCA GCCTCACAGT 3494 

10 CGGAAATGGC TTAACGCTTA CCGATACTGT GATCCGCCCC AACCTAGGGG ACGGCCTAGA 3554 

GGTCAGAGAC AATAAAATCA TTGTTAAGCT GGGCGCGAAT CTTCGTTTTG AAAACGGAGC 3614 

CGTAACCGCC GGCACCGTTA ACCCTTCTGC GCCCGAGGCA CCACCAACTC TCACTGCAGA 3674 

ACCACCCCTC CGAGCCTCCA ACTCCCATCT TCAACTGTCC CTATCGGAGG GCTTGGTTGT 3734 

GCATAACAAC GCCCTTGCTC TCCAACTGGG AGACGGCATG GAAGTAAATC AGCACGGACT 3794 

15 TACTTTAAGA GTAGGCTCGG GTTTGCAAAT GCGTGACGGC ATTTTAACAG TTACACCCAG 3854 

CGGCACTCCT ATTGAGCCCA GACTGACTGC CCCACTGACT CAGACAGAGA ATGGAATCGG 3914 

GCTCGCTCTC GGCGCCGGCT TGGAATTAGA CGAGAGCGCG CTCCAAGTAA AAGTTGGGCC 3974 

CGGCATGCGC CTGAACCCTG TAGAAAAGTA TGTAACCCTG CTCCTGGGTC CTGGCCTTAG 4034 

TTTTGGGCAG CCGGCCAACA GGACAAATTA TGATGTGCGC GTTTCTGTGG AGCCCCCCAT 4094 

20 GGTTTTCGGA CAGCGTGGTC AGCTCACATT TTTAGTGGGT CACGGACTAC ACATTCAAAA 4154 

TTCCAAACTT CAGCTCAATT TGGGACAAGG CCTCAGAACT GACCCCGTCA CCAACCAGCT 4214 

GGAAGTGCCC CTCGGTCAAG GTTTGCAAAT TGCAGACGAA TCCCAGGTTA GGGTTAAATT 4274 

GGGCGATGGC CTGCAGTTTG ATTCACAAGC TCGCATCACT ACCGCTCCTA ACATGGTCAC 4334 

TGAAACTCTG TGGACCGGAA CAGGCAGTAA TGCTAATGTT ACATGGCGGG GCTACACTGC 4394 

25 CCCCGGCAGC AAACTCTTTT TGAGTCTCAC TCGGTTCAGC ACTGGTCTAG TTTTAGGAAA 4454 

CATGACTATT GACAGCAATG CATCCTTTGG GCAATACATT AACGCGGGAC ACGAACAGAT 4514 

CGAATGCTTT ATATTGTTGG ACAATCAGGG TAACCTAAAA GAAGGATCTA ACTTGCAAGG 4574 

CACTTGGGAA GTGAAGAACA ACCCCTCTGC TTCCAAAGCT GCTTTTTTGC CTTCCACCGC 4634 

CCTATACCCC ATCCTCAACG AAAGCCGAGG GAGTCTTCCT GGAAAAAATC TTGTGGGCAT 4694 

3 0 GCAAGCCATA CTGGGAGGCG GGGGCACTTG CACTGTGATA GCCACCCTCA ATGGCAGACG 4754 

CAGCAACAAC TATCCCGCGG GCCAGTCCAT AATTTTCGTG TGGCAAGAAT TCAACACCAT 4814 

AGCCCGCCAA CCTCTGAACC ACTCTACACT TACTTTTTCT TACTGGACTT AAATAAGTTG 4874 

GAAATAAAGA GTTAAACTGA ATGTTTAAGT GCAACAGACT TTTATTGGTT TTGGCTCACA 4934 

ACAAATTACA ACAGCATAGA CAAGTCATAC CGGTCAAACA ACACAGGCTC TCGAAAACGG 4994 

35 GCTAACCGCT CCAAGAATCT GTCACGCAGA CGAGCAAGTC CTAAATGTTT TTTCACTCTC 5054 

TTCGGGGCCA AGTTCAGCAT GTATCGGATT TTCTGCTTAC ACCTTT 5100 

(2) INFORMATION FOR SEQ ID M0:20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 142 amino acids 
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(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:20: 

Phe Net Leu Gly Pro Leu Ala Ser Met Leu Gin Gly Pro Gly Leu Ser 
5 1 5 10 15 

Leu Thr Arg Leu Met Phe Val Thr Met Glu Ala Ser Gin lie Thr Thr 
20 25 30 

Pro Pro Arg Phe Cys Gly Trp Leu Glu Arg Pro Met Ala Pro Gly Ser 
35 40 45 

Leu Phe Phe Ser Val Thr Thr Ala Pro Gin Leu Gin Pro Pro Gly Ser 
50 55 60 

Ser Arg Thr Leu Leu Thr Thr Thr Ala Ala Leu Trp Cys Pro Ser lie 
65 70 75 80 

Thr Ser Ser Thr Thr His Ser Leu Ser Ser Ala Ala His Thr Gly Thr 
85 90 95 

Thr Ser Val Leu Ser Leu Pro Ala Asn Lys Leu Thr Ser Leu Pro Val 
100 105 110 

15 Thr Ser Pro Ala Ser Arg Ser Pro Ser Ala Ser Pro Pro Arg Trp Glu 
115 120 125 

Leu Pro Thr Lys Gin Ser Leu Gly Phe Phe Lys Met Tyr Pro 
130 135 140 

(2) INFORMATION FOR SEQ ID NO:21: 

(!) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5100 base pairs 
20 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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25 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1246.. 1707 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:21: 

CCTCATCAAA CAACCCGTGG TGGGCACCAC CCACGTGGAA ATGCCTCGCA ACGAAGTCCT 60 

AGAACAACAT CTGACCTCAC ATGGCGCTCA AATCGCGGGC GGAGGCGCTG CGGGCGATTA 120 

CTTTAAAAGC CCCACTTCAG CTCGAACCCT TATCCCGCTC ACCGCCTCCT GCTTAAGACC 180 

3 0 AGATGGAGTC TTTCAACTAG GAGGAGGCTC GCGTTCATCT TTCAACCCCC TGCAAACAGA 240 

TTTTGCCTTC CACGCCCTGC CCTCCAGACC GCGCCACGGG GGCATAGGAT CCAGGCAGTT ' 300 

TGTAGAGGAA TTTGTGCCCG CCGTCTACCT CAACCCCTAC TCGGGACCGC CGGACTCTTA 360 

TCCGGACCAG TTTATACGCC ACTACAACGT GTACAGCAAC TCTGTGAGCG GTTATAGCTG 420 

AGATTGTAAG ACTCTCCTAT CTGTCTCTGT GCTCCTTTTC CGCTTCAAGC CCCACAAGCA 480 

35 TGAAGGGGTT TCTGCTCATC TTCAGCCTGC TTGTGCATTG TCCCCTAATT CATGTTGGGA 540 

CCATTAGCTT CTATGCTGCA AGGCCCGGGT CTGAGCCTAA CGCGACTTAT GTTTGTGACT 600 

ATGGAAGCGA GTCAGATTAC AACCCCACCA CGGTTCTGTG GTTGGCTCGA GAGACCGATG 660 

6CTCCTGGAT CTCTGTTCTT TTCCGTCACA ACGGCTCCTC AACTGCAGCC CCCGGGGTCG 720 

TCGCGCACTT TACTGACCAC AACAGCAGCA TTGTGGTGCC CCAGTATTAC CTCCTCAACA 780 
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ACTCACTCTC TAAGCTCTGC TCCTCATACC GGCACAACGA GCGTTCTCAG TTTACCTGCA 840 

AACAAGCTGA CGTCCCTACC TGTCACGAGC CCGGCAAGCC GCTCACCCTC CGCGTCTCCC 900 

CCGCGCTGGG AACTGCCCAC CAAGCAGTCA CTTGGTTTTT TCAAAATGTA CCCATAGCTA 960 

CTGTTTACCG ACCTTGGGGC AATGTAACTT GGTTTTGTCC TCCCTTCATG TGTACCTTTA 1020 

5 ATGTCAGCCT GAACTCCCTA CTTATTTACA ACTTTTCTGA CAAAACCGGG GGGCAATACA 1080 

CAGCTCTCAT GCACTCCGGA CCTGCTTCCC TCTTTCAGCT CTTTAAGCCA ACGACTTGTG 1140 

TCACCAAGGT GGAGGACCCG CCGTATGCCA ACGACCCGGC CTCGCCTGTG TGGCGCCCAC 1200 

TGCTTTTTGC CTTCGTCCTC TGCACCGGCT GCGCGGTGTT GTTAA CCG CCT TCG 1254 

Pro Pro Ser 
1 



10 GTC CAT CGA TTC TAT 
Val His Arg Phe Tyr 
5 

GGA GTC CCG AGC CCT 
Gly Val Pro Ser Pro 
20 



CCG GTA CCC GAA AGC TTA TCT 
Pro Vat Pro Glu Ser Leu Ser 
10 15 

ATA CCA CCC TCC ACT AAC AGT 
lie Pro Pro Ser Thr Asn Ser 
25 30 



CAG CCC GCT TTT 1302 
Gin Pro Ala Phe 



CCC CCC ATG GAG ' 1350 
Pro Pro Met Glu 
35 



CCA GAC GGA GTT CAT GCC GAG CAG CAG TTT ATC CTC AAT CAG ATT TCC 1398 
Pro Asp Gly Val His Ala Glu Gin Gin Phe He Leu Asn Gin He Ser 
15 40 45 50 

TGC GCC AAC ACT GCC CTC CAG CGT CAA AGG GAG GAA CTA GCT TCC CTT 1446 
Cys Ala Asn Thr Ala Leu Gin Arg Gin Arg Glu Glu Leu Ala Ser Leu 
55 60 65 

GTC ATG TTG CAT GCC TGT AAG CGT GGC CTC TTT TGT CCA GTC AAA ACT 1494 
Val Met Leu His Ala Cys Lys Arg Gly Leu Phe Cys Pro Val Lys Thr 
70 75 80 

20 TAC AAG CTC AGC CTC AAC GCC TCG GCC AGC GAG CAC AGC CTG CAC TTT 1542 
Tyr Lys Leu Ser Leu Asn Ala Ser Ala Ser Glu His Ser Leu His Phe 
85 90 95 

GAA AAA AGT CCC TCC CGA TTC ACC CTG GTC AAC ACT CAC GCC GGA GCT 1590 
Glu Lys Ser Pro Ser Arg Phe Thr Leu Val Asn Thr His Ala Gly Ala 
100 105 110 115 

TCT GTG CGA GTG GCC CTA CAC CAC CAG GGA GCT TCC GGC AGC ATC CGC 1638 
Ser Val Arg Val Ala Leu His His Gin Gly Ala Ser Gly Ser lie Arg 
25 120 125 130 

TGT TCC TGT TCC CAC GCC GAG TGC CTC CCC GTC CTC CTC AAG ACC CTC 1686 
Cys Ser Cys Ser His Ala Glu Cys Leu Pro Val Leu Leu Lys Thr Leu 
135 140 145 

TGT GCC TTT AAC TTT TTA GAT TAGCTGAAAG CAAATATAAA ATGGTGTGCT 1737 
Cys Ala Phe Asn Phe Leu Asp 
150 

30 TACCGTAATT CTGTTTTGAC TTGTGTGCTT GATTTCTCCC CCTGCGCCGT AATCCAGTGC 1797 

CCCTCTTCAA AACTCTCGTA CCCTATGCGA TTCGCATAGG CATATTTTCT AAAAGCTCTG 1857 

AAG T CAA CAT CACTCTCAAA CACTTCTCCG TTGTAGGTTA CTTTCATCTA CAGATAAAGT 1917 

CATCCACC6G TTAACATCAT GAApAGAAGT GTGCCCCAGG ACTTTAATCT TGTGTATCCG 1977 

TACAAGGCTA AGAGGCCCAA CATCATGCCG CCCTTTTTTG ACCGCAATGG CTTTGTTGAA 2037 

35 AACCAAGAAG CCACGCTAGC CATGCTTGTG GAAAAGCCGC TCACGTTCGA CAAGGAAGGT 2097 

GCGCTGACCC TGGGCGTCGG ACGCGGCATC CGCATTAACC CCGCGGGGCT TCTGGAGACA 2157 

AACGACCTCG CGTCCGCTGT CTTCCCACCG CTGGCCTCCG ATGAGGCCGG CAACGTCACG 2217 

CTCAACATGT CTGACGGGCT ATATACTAAG GACAACAAGC TAGCTGTCAA AGTAGGTCCC 2277 

GGGCTGTCCC T CGA CTC CAA TAATGCTCTC CAGGTCCACA CAGGCGACGG GCTCACGGTA 2337 



ACCGATGACA AGGTGTCTCT AAATACCCAA GCTCCCCTCT CGACCACCAG CGCGGGCCTC 2397 
TCCCTACTTC TGGGTCCCAG CCTCCACTTA GGTGAGGAGG AACGACTAAC AGTAAACACC 2457 
GGAGCG6GCC TCCAAATTAG CAATAACGCT CTG6CCGTAA AAGTA6GTTC AGGTATCACC 2517 
GTAGATGCTC AAAACCAGCT CGCTGCATCC CTGGGGGACG GTCTAGAAAG CAGAGATAAT 2577 
AAAACTGTCG TTAAGGCTGG GCCCGGACTT ACAATAACTA ATCAAGCTCT TACTGTTGCT 2637 
ACCGGGAACG GCCTTCAGGT CAACCCGGAA GGGCAACTGC AGCTAAACAT TACTGCCGGT 2697 
CAGGGCCTCA ACTTTGCAAA CAACAGCCTC GCCGTGGAGC TGGGCTCGGG CCTGCATTTT 2757 
CCCCCTG6CC AAAACCAAGT AAGCCTTTAT CCCGGAGATG GAATAGACAT CCGAGATAAT 2817 
AGGGTGACTG TGCCCGCTGG GCCAGGCCTG AGAATGCTCA ACCACCAACT TGCCGTAGCT 2877 
TCCGGAGACG GTTTAGAAGT CCACAGCGAC ACCCTCCGGT TAAAGCTCTC CCACGGCCTG 2937 
ACATTTGAAA ATGGCGCCGT ACGAGCAAAA CTAGGACCAG GACTTGGCAC AGACGACTCT 2997 
GGTCGGTCCG TGGTTCGCAC AGGTCGAGGA CTTAGAGTTG CAAACGGCCA AGTCCAGATC 3057 
TTCAGCGGAA GAGGCACCGC CATCGGCACT GATAGCAGCC TCACTCTCAA CATCCGGGCG 3117 
CCCCTACAAT TTTCTGGACC CGCCTTGACT GCTAGTTTGC AAGGCAGTGG TCCGATTACT 3177 

TACAACAGCA ACAATGGCAC TTTCGGTCTC TCTATACGCC CCGGAATGTG GGTAGACCAA 3237 

AACAGACTTC AGGTAAACCC AGGCGCTGGT TTAGTCTTCC AAGGAAACAA CCTTGTCCCA 3297 

AACCTTGCGG ATCCGCTGGC TATTTCCGAC AGCAAAATTA GTCTCAGTCT CGGTCCCGGC 3357 

CTGACCCAAG CTTCCAACGC CCTGACTTTA AGTTTAGGAA ACGGGCTTGA ATTCTCCAAT 3417 

CAA6CCGTTG CTATAAAAGC GGGCC6GGGC TTACGCTTTG AGTCTTCCTC ACAAGCTTTA 3477 

GAGAGCAGCC TCACAGTCGG AAAT6GCTTA ACGCTTACCG ATACTGTGAT CCGCCCCAAC 3537 

CTAGGGGACG GCCTAGAGGT CAGAGACAAT AAAATCATTG TTAAGCTGGG CGCGAATCTT 3597 

CGTTTTGAAA ACGGAGCCGT AACCGCCGGC ACCGTTAACC CTTCTGCGCC CGAGGCACCA 3657 

CCAACTCTCA CTGCAGAACC ACCCCTCCGA GCCTCCAACT CCCATCTTCA ACTGTCCCTA 3717 

TCGGAGGGCT TGGTTGTGCA TAACAACGCC CTTGCTCTCC AACTGGGAGA CGGCATGGAA 3777 

GTAAATCAGC ACG6ACTTAC TTTAAGAGTA GGCTCGGGTT TGCAAATGCG TGACGGCATT 3837 

TTAACAGTTA CACCCAGCGG CACTCCTATT GAGCCCA6AC TGACTGCCCC ACTGACTCAG 3897 

ACAGAGAATG GAATCGGGCT CGCTCTCGGC GCCGGCTTGG AATTAGACGA GAGCGCGCTC 3957 

CAAGTAAAAG TTGG6CCCGG CATGCGCCTG AACCCTGTAG AAAAGTATGT AACCCTGCTC 4017 

CTGGGTCCTG GCCTTAGTTT T6GGCAGCCG GCCAACAGGA CAAATTATGA TGTGCGCGTT 4077 

TCTGTGGAGC CCCCCATGGT TTTCGGACAG CGTGGTCAGC TCACATTTTT AGTGGGTCAC 4137 

GGACTACACA TTCAAAATTC CAAACTTCAG CTCAATTTGG GACAAGGCCT CAGAACTGAC 4197 

CCCGTCACCA ACCAGCTGGA AGTGCCCCTC 6GTCAAGGTT TGGAAATTGC AGACGAATCC 4257 

CAGGTTAGGG TTAAATTGGG CGATGGCCTG CAGTTT6ATT CACAAGCTCG CATCACTACC 4317 

GCTCCTAACA TGGTCACTGA AACTCTGTGG ACCGGAACAG GCAGTAATGC TAATGTTACA 4377 

TGGCGG6GCT ACACTGCCCC CGGCAGCAAA CTCTTTTTGA GTCTCACTCG GTTCAGCACT 4437 

GGTCTAGTTT TAGGAAACAT GACTATTGAC AGCAATGCAT CCTTTGGGCA ATACATTAAC 4497 

GCGGGACACG AACAGATCGA ATGCTTTATA TTGTTGGACA ATCAGGGTAA CCTAAAAGAA 4557 

GGATCTAACT TGCAAGGCAC TTGGGAAGTG AAGAACAACC CCTCTGCTTC CAAAGCTGCT 4617 

TTTTTGCCTT CCACCGCCCT ATACCCCATC CTCAACGAAA GCCGAGGGAG TCTTCCTGGA 4677 
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AAAAATCTTG TGGGCATGCA AGCCATACTG GGAGGCGGGG GCACTTGCAC TGTGATAGCC 4737 

ACCCTCAATG GCAGACGCAG CAACAACTAT CCCGCGGGCC AGTCCATAAT TTTCGTGTGG 4797 

CAAGAATTCA ACACCATAGC CCGCCAACCT CTGAACCACT CTACACTTAC TTTTTCTTAC 4857 

TGGACTTAAA TAAGTTGGAA ATAAAGAGTT AAACTGAATG TTTAAGTGCA ACAGACTTTT 4917 

5 ATTGGTTTTG GCTCACAACA AATTACAACA GCATAGACAA GTCATACCGG TCAAACAACA 4977 

CACGCTCTCG AAAACGGGCT AACCGCTCCA AGAATCTGTC ACGCAGACGA GCAAGTCCTA 5037 

AATGTTTTTT CACTCTCTTC GGGGCCAAGT TCAGCATGTA TCGGATTTTC TGCTTACACC 5097 



(2) INFORMATION FOR SEQ ID N0:22: 

10 

(f) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 154 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(XI > SEQUENCE DESCRIPTION: SEQ ID N0:22: 

15 Pro Pro Ser Val His Arg Phe Tyr Pro Val Pro Glu Ser Leu Ser Gin 
1 5 10 15 

Pro Ala Phe Gly Vat Pro Ser Pro lie Pro Pro Ser Thr Asn Ser Pro 
20 25 30 

Pro Met Glu Pro Asp Gly Val His Ala Glu Gin Gin Phe He Leu Asn 
35 40 45 

Gin He Ser Cys Ala Asn Thr Ala Leu Gin Arg Gin Arg Glu Glu Leu 
20 50 55 60 

Ala Ser Leu Vat Met Leu His Ala Cys Lys Arg Gly Leu Phe Cys Pro 
65 70 75 80 

Val Lys Thr Tyr Lys Leu Ser Leu Asn Ala Ser Ala Ser Glu His Ser 
85 90 95 



25 



Leu His Phe Glu Lys Ser Pro Ser Arg Phe Thr Leu Val Asn Thr His 
100 105 110 

Ala Gly Ala Ser Val Arg Val Ala Leu His His Gin Gly Ala Ser Gly 
115 120 125 

Ser lie Arg Cys Ser Cys Ser His Ala Glu Cys Leu Pro Val Leu Leu 
130 135 140 

Lys Thr Leu Cys Ala Phe Asn Phe Leu Asp 
145 150 

30 (2) INFORMATION FOR SEQ ID KO:23: 

<i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5100 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



35 



<ii) MOLECULE TYPE: DNA (genomic) 



O'x) FEATURE: 

(A) NAME /KEY: CDS 

(B) LOCATION: 1439.-1702 



5100 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:23: 
CCTCATCAAA CAACCCGTGG TGGGCACCAC CCACGTGGAA ATGCCTCGCA ACGAAGTCCT 



60 
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-87- 

AGAACAACAT CTGACCTCAC ATGGCGCTCA AATCGCGGGC GGAGGCGCTG CGGGCGATTA 120 

CTTTAAAAGC CCCACTTCAG CTCGAACCCT TATCCCGCTC ACCGCCTCCT GCTTAAGACC 180 

AGATGGAGTC TTTCAACTAG GAGGAGGCTC GCGTTCATCT TTCAACCCCC TGCAAACAGA 240 

TTTTGCCTTC CACGCCCTGC CCTCCAGACC GCGCCACGGG GGCATAGGAT CCAGGCAGTT 300 

5 TGTAGAGGAA TTTGTGCCCG CCGTCTACCT CAACCCCTAC TCGGGACCGC CGGACTCTTA 360 

TCCGGACCAG TTTATACGCC ACTACAACGT GTACAGCAAC TCTGTGAGCG GTTATAGCTG 420 

AGATTGTAAG ACTCTCCTAT CTGTCTCTGT GCTGCTTTTC CGCTTCAAGC CCCACAAGCA 480 

TGAAGGGGTT TCTGCTCATC TTCAGCCTGC TTGTGCATTG TCCCCTAATT CATGTTGGGA 540 

CCATTAGCTT CTATGCTGCA AGGCCCGGGT CTGAGCCTAA CGCGACTTAT GTTTGTGACT 600 

10 ATGGAAGCGA GTCAGATTAC AACCCCACCA CGGTTCTGTG GTTGGCTCGA GAGACCGATG 660 

GCTCCTGGAT CTCTGTTCTT TTCCGTCACA ACGGCTCCTC AACTGCAGCC CCCGGGGTCG 720 

TCGCGCACTT TACTGACCAC AACAGCAGCA TTGTGGTGCC CCAGTATTAC CTCCTCAACA 780 

ACTCACTCTC TAAGCTCTGC TGCTCATACC GGCACAACGA GCGTTCTCAG TTTACCTGCA 840 

AACAAGCTGA CGTCCCTACC TGTCACGAGC CCGGCAAGCC GCTCACCCTC CGCGTCTCCC 900 

15 CCGCGCTGGG AACTGCCCAC CAAGCAGTCA CTTGGTTTTT TCAAAATGTA CCCATAGCTA 960 

CTGTTTACCG ACCTTGGGGC AATGTAACTT GGTTTTGTCC TCCCTTCATG TGTACCTTTA 1020 

ATGTCAGCCT GAACTCCCTA CTTATTTACA ACTTTTCTGA CAAAACCGGG 6GGCAATACA 1080 

CAGCTCTCAT GCACTCCGGA CaGCTTCCC TCTTTCAGCT CTTTAAGCCA ACGACTTGTG 1140 

TCACCAAGGT GGAGGACCCG CCGTATGCCA ACGACCCGGC CTCGCCTGTG TGGCGCCCAC 1200 

20 TGCTTTTTGC CTTCGTCCTC TGCACCGGCT GCGCGGTGTT GTTAACCGCC TTCGGTCCAT 1260 

CGATTCTATC CGGTACCCGA AAGCTTATCT CAGCCCGCTT TTGGAGTCCC GAGCCCTATA 1320 

CCACCCTCCA CTAACAGTCC CCCCATGGAG CCAGACGGAG TTCATGCCGA GCAGCAGTTT 1380 

ATCCTCAATC AGATTTCCTG CGCCAACACT GCCCTCCAGC GTCAAAGGGA GGAACTAG 1438 

CTT CCC TTG TCA TGT TGC ATG CCT GTA AGC 6TG GCC TCT TTT GTC CAG 1486 
Leu Pro Leu Ser Cys Cys Met Pro Vel Ser Vat Ala Ser Phe Val Gin 
25 1 5 10 15 

TCA AAA CTT ACA AGC TCA GCC TCA ACG CCT CGG CCA GCG AGC ACA GCC 1534 
Ser Lys Leu Thr Ser Ser Ala Ser Thr Pro Arg Pro Ala Ser Thr Ala 
20 25 30 

TGC ACT TTG AAA AAA GTC CCT CCC GAT TCA CCC TGG TCA ACA CTC ACG 1582 
Cys Thr Leu Lys Lys Val Pro Pro Asp Ser Pro Trp Ser Thr Leu Thr 
35 40 45 

30 CCG GAG CTT CTG TGC GAG TGG CCC TAC ACC ACC AGG GAG CTT CCG GCA 1630 
Pro Glu Leu Leu Cys Glu Trp Pro Tyr Thr Thr Arg Glu Leu Pro Ala 
50 55 60 

GCA TCC GCT GTT CCT GTT CCC ACG CCG AGT GCC TCC CCG TCC TCC TCA 1678 
Ala Ser Ala Vel Pro Val Pro Thr Pro Ser Ala Ser Pro Ser Ser Ser 
65 70 75 80 

AGA CCC TCT GTG CCT TTA ACT TTT TAGATTAGCT GAAAGCAAAT ATAAAATGGT 1732 
Arg Pro Ser Val Pro Leu Thr Phe 
35 85 

GTGCTTACCG TAATTCTGTT TTGACTTGTG TGCTTGATTT CTCCCCCTGC GCCGTAATCC 1792 

AGTGCCCCTC TTCAAAACTC TCGTACCCTA TGCGATTCGC ATAGGCATAT TTTCTAAAAG 1852 

CTCTGAAGTC AACATCACTC TCAAACACTT CTCCGTTGTA GGTTACTTTC AT CT ACA GAT 1912 

AAAGTCATCC ACCGGTTAAC ATCATGAAGA GAAGTGT6CC CCAGGACTTT AATCTTGTGT 1972 
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ATCCGTACAA GGCTAAGAGG CCCAACATCA TGCCGCCCTT TTTTGACCGC AATGGCTTTG 2032 
TTGAAAACCA AGAAGCCACG CTAGCCATGC TTGTGGAAAA GCCGCTCACG TTCGACAAGG 2092 
AAGGTGCGCT GACCCTGGGC GTCGGACGCG GCATCCGCAT TAACCCCGCG GGGCTTCTGG 2152 
AGACAAACGA CCTCGCGTCC GCTGTCTTCC CACCGCTGGC CTCCGATGAG GCCGGCAACG 2212 
5 TCACGCTCAA CATGTCTGAC GGGCTATATA CTAAGGACAA CAAGCTAGCT 3TCAAAGTAG 2272 
GTCCCGGGCT GTCCCTCGAC TCCAATAATG CTCTCCAGGT CCACACAGGC GACGGGCTCA 2332 
CGGTAACCGA TGACAAGGTG TCTCTAAATA CCCAAGCTCC CCTCTCGACC ACCAGCGCGG 2392 
GCCTCTCCCT ACTTCTGGGT CCCAGCCTCC ACTTAGGTGA GGAGGAACGA CTAACAGTAA 2452 
ACACCGGAGC GGGCCTCCAA ATTAGCAATA ACGCTCTGGC CGTAAAAGTA GGTTCAGGTA 2512 
10 TCACCGTAGA TGCTCAAAAC CAGCTCGCTG CATCCCTGGG GGACGGTCTA GAAAGCAGAG 2572 
ATAATAAAAC TGTCGTTAAG GCTGGGCCCG GACTTACAAT AACTAATCAA GCTCTTACTG 2632 
TTGCTACCGG 6AACGGCCTT CAGGTCAACC CGGAAGGGCA ACTGCAGCTA AACATTACTG 2692 
CCGGTCAGGG CCTCAACTTT GCAAACAACA GCCTCGCCGT GGAGCTGGGC TCGGGCCTGC 2752 
ATTTTCCCCC TGGCCAAAAC CAAGTAAGCC TTTATCCCGG AGATGGAATA GACATCCGAG 2812 
15 ATAATAGGGT GACTGTGCCC GCTGGGCCAG GCCTGAGAAT GCTCAACCAC CAACTTGCCC 2872 
TAGCTTCCGG AGACGGTTTA GAAGTCCACA GCGACACCCT CCGGTTAAAG CTCTCCCACG 2932 
GCCTGACATT TGAAAATGGC GCCGTACGAG CAAAACTAGG ACCAGGACTT GGCACAGACG 2992 
ACTCTGGTCG GTCCGTGGTT CGCACAGGTC GAGGACTTAG AGTTGCAAAC GGCCAAGTCC 3052 
AGATCTTCAG CGGAAGAGGC ACCGCCATCG GCACTGATAG CAGCCTCACT CTCAACATCC 3112 
20 GG6CGCCCCT ACAATTTTCT GGACCCGCCT TGACTGCTAG TTTGCAAGGC AGTGGTCCGA 3172 
TTACTTACAA CAGCAACAAT GGCACTTTCG GTCTCTCTAT AGGCCCCGGA ATGTGGGTAG 3232 
ACCAAAACAG ACTTCAGGTA AACCCAGGCG CTGGTTTAGT CTTCCAAGGA AACAACCTTG 3292 

TCCCAAACCT TGCGGATCCG CTGGCTATTT CCGACAGCAA AATTAGTCTC AGTCTCGGTC 3352 

CCGGCCTGAC CCAAGCTTCC AACGCCCTGA CTTTAAGTTT AGGAAACGGG CTTGAATTC7 3412 

25 CCAATCAAGC CGTTGCTATA AAAGCGGGCC GGGGCTTACG CTTTGAGTCT TCCTCACAAG 3472 

CTTTAGAGAG CAGCCTCACA GTCGGAAATG GCTTAACGCT TACCGATACT GTGATCCGCC 3532 

CCAACCTAGG GGACGGCCTA GAGGTCAGAG ACAATAAAAT CATTGTTAAG CTGGGCGCGA 3592 

ATCTTCGTTT TGAAAACGGA GCCGTAACCG CCGGCACCGT TAACCCTTCT GCGCCCGAGG 3652 

CACCACCAAC TCTCACTGCA GAACCACCCC TCCGAGCCTC CAACTCCCAT CTTCAACTGT 3712 

30 CCCTATCCGA GGGCTTGGTT GTGCATAACA ACGCCCTT6C TCTCCAACTG GGAGACGGCA 3772 

TGGAAGTAAA TCAGCACGGA CTTACTTTAA GAGTAGGCTC GGGTTTGCAA ATGCGTGACG 3832 

GCATTTTAAC AGTTACACCC AGCGGCACTC CTATTGAGCC CAGACTGACT GCCCCACTGA 3892 

CTCAGACAGA GAATGGAATC GGGCTCGCTC TCGGCGCCGG CTTGGAATTA GACGAGAGCG 3952 

CGCTCCAAGT AAAAGTTGGG CCCGGCATGC GCCTGAACCC TGTAGAAAAG TATGTAACCC 4012 

35 TGCTCCTGGG TCCTGGCCTT AGTTTTGGGC AGCCGGCCAA CAGGACAAAT TATGATGTGC 4072 

GCGTTTCTGT GGAGCCCCCC ATGGTTTTCG GACAGCGTGG TCAGCTCACA TTTTTAGTGG 4132 

GTCACGGACT ACACATTCAA AATTCCAAAC TTCAGCTCAA TTTGGGACAA GGCCTCAGAA 4192 

CTGACCCCGT CACCAACCAG CTGGAAGTGC CCCTCGGTCA AGGTTTGGAA ATTGCAGACG 4252 

AATCCCAGGT TAGGGTTAAA TTGGGCGATG GCCTGCAGTT TGATTCACAA GCTCGCATCA 4312 
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10 



CTACCGCTCC 


TAACATGGTC 


ACTGAAACTC 


TGTGGACCGG AACAGGCAGT AATGCTAATG 


4372 


TTACATGGCG 


GGGCTACACT 


GCCCCCGGCA 


GCAAACTCTT TTTGAGTCTC ACTCGGTTCA 


4432 


GCACTGGTCT 


AGTTTTAGGA 


AACATGACTA 


TTGACAGCAA TGCATCCTTT GGGCAATACA 


4492 


TTAACGCGGG 


ACACGAACAG 


ATCGAATGCT 


TTATATTGTT GGACAATCAG GGTAACCTAA 


4552 


AAGAAGGATC 


TAACTTGCAA 


GGCACTTGGG 


AAGTGAAGAA CAACCCCTCT GCTTCCAAAG 


4612 


CTGCTTTTTT 


GCCTTCCACC 


GCCCTATACC 


CCATCCTCAA CGAAAGCCGA GGGAGTCTTC 


4672 


CTGGAAAAAA 


TCTTGTGGGC 


ATGCAAGCCA 


TACTGGGAGG CGGGGGCACT TGCACTGTGA 


4732 


TAGCCACCCT 


CAATGGCAGA 


CGCAGCAACA 


ACTATCCCGC GGGCCAGTCC ATAATTTTCG 


4792 


TGTGGCAAuA 


ATTCAACACC 


ata /*r rrr re- 
al AbLLCULL 


AAfCTPT^AA rrl r*Tf*T A f*A PTTAf*TTTTT 

AACCTCTGAA CuACTCTACA CTTACTTTTT 


/ DO 

4852 


CTTACTGGAC 


TTAAATAAGT 


TGGAAATAAA 


GAGTTAAACT GAATGTTTAA GTGCAACAGA 


4912 


CTTTTATTGG 


TTTTGGCTCA 


CAACAAATTA 


CAACAGCATA GACAAGTCAT ACCGGTCAAA 


4972 


CAACACAGGC 


TCTCGAAAAC 


GGGCTAACCG 


CTCCAAGAAT CTGTCACGCA GACGAGCAAG 


5032 


TCCTAAATGT 


TTTTTCACTC 


TCTTCGGGGC 


CAAGTTCAGC ATGTATCGGA TTTTCTGCTT 


5092 


ACACCTTT 








5100 



15 

(2) INFORMATION FOR SEQ ID NO:24: 

(I) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 88 amino acids 

(B) TYPE: amino acid 
(0) TOPOLOGY: linear 

(ii) HOLECULE TYPE: protein 

20 Cxi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 

Leu Pro Leu Ser Cys Cys Met Pro Val Ser Vat Ala Ser Phe Val Gin 
15 10 15 

Ser Lys Leu Thr Ser Ser Ala Ser Thr Pro Arg Pro Ala Ser Thr Ala 
20 25 30 

Cys Thr Leu Lys Lys Val Pro Pro Asp Ser Pro Trp Ser Thr Leu Thr 
35 40 45 

25 

Pro Glu Leu Leu Cys Glu Trp Pro Tyr Thr Thr Arg Glu Leu Pro Ala 
50 55 60 

Ala Ser Ala Val Pro Val Pro Thr Pro Ser Ala Ser Pro Ser Ser Ser 
65 70 75 80 

Arg Pro Ser Val Pro Leu Thr Phe 
85 

30 (2) INFORMATION FOR SEQ ID N0:25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5100 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DMA (genomic) 

35 

(ix) FEATURE: 

(A) NAME /KEY : COS 

(B) LOCATION: 1915. .4863 



Cxi) SEQUENCE DESCRIPTION: SEQ ID NO:25: 
CCTCATCAAA CAACCCGTGG TGGGCACCAC CCACGTGGAA ATGCCTCGCA ACGAAGTCCT 



60 
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AGAACAACAT CTGACCTCAC ATGGCGCTCA AATCGCGGGC GGAGGCGCTG CGGGCGATTA 120 

CTTTAAAA6C CCCACTTCAG CTCGAACCCT TATCCCGCTC ACCGCCTCCT GCTTAAGACC 180 

AGATGGAGTC TTTCAACTAG GAGGAGGCTC GCGTTCATCT TTCAACCCCC TGCAAACAGA 240 

TTTTGCCTTC CACGCCCTGC CCTCCAGACC GCGCCACGGG GGCATAGGAT CCAG6CAGTT 300 

5 TGTAGAGGAA TTTGTGCCCG CCGTCTACCT CAACCCCTAC TCGGGACCGC CGGACTCTTA 360 

TCCGGACCAG TTTATACGCC ACTACAACGT GTACAGCAAC TCTGTGAGCG GTTATAGCTG 420 

AGATTGTAAG ACTCTCCTAT CTGTCTCTGT GCTGCTTTTC CGCTTCAAGC CCCACAAGCA 480 

TGAAGGGGTT TCTGCTCATC TTCAGCCTGC TTGTGCATTG TCCCCTAATT CATGTTGGGA 540 

CCATTAGCTT CTATGCTGCA AGGCCCGGGT CTGAGCCTAA CGCGACTTAT GTTTGTGACT 600 

10 ATGGAAGCGA GTCAGATTAC AACCCCACCA CGGTTCTGTG GTTGGCTCGA GAGACCGATG 660 

GCTCCTGGAT CTCTGTTCTT TTCCGTCACA ACGGCTCCTC AACTGCAGCC CCCGGGGTCG 720 

TCGCGCACTT TACTGACCAC AACAGCAGCA TTGTG6TGCC CCAGTATTAC CTCCTCAACA 780 

ACTCACTCTC TAAGCTCTGC TGCTCATACC GGCACAACGA GCGTTCTCAG TTTACCTGCA 840 

AACAAGCTGA CGTCCCTACC TGTCACGAGC CC6GCAAGCC GCTCACCCTC CGCGTCTCCC 900 

15 CCGCGCTGGG AACTGCCCAC CAAGCAGTCA CTTGGTTTTT TCAAAATGTA CCCATA6CTA 960 

CTGTTTACCG ACCTTGGGGC AATGTAACTT GGTTTTGTCC TCCCTTCATG TGTACCTTTA 1020 

ATGTCAGCCT GAACTCCCTA CTTATTTACA ACTTTTCTGA CAAAACCGGG GGGCAATACA 1080 

CAGCTCTCAT GCACTCCGGA CCTGCTTCCC TCTTTCAGCT CTTTAAGCCA ACGACTTGTG 1140 

TCACCAAGGT GGAGGACCCG CCGTATGCCA ACGACCCGGC CTCGCCTGTG TGGCGCCCAC 1200 

20 TGCTTTTTGC CTTCGTCCTC TGCACCGGCT GCGCGGTGTT GTTAACCGCC TTCGGTCCAT 1260 

CGATTCTATC CGGTACCCGA AAGCTTATCT CAGCCCGCTT TTGGAGTCCC GAGCCCTATA 1320 

CCACCCTCCA CTAACAGTCC CCCCATGGAG CCAGACGGAG TTCATGCCGA GCAGCAGTTT 1380 

ATCCTCAATC AGATTTCCTG CGCCAACACT GCCCTCCAGC GTCAAAGGGA GGAACTAGCT 1440 

TCCCTTGTCA TGTTGCATGC CTGTAAGCGT GGCCTCTTTT GTCCAGTCAA AACTTACAAG 1500 

25 CTCAGCCTCA ACGCCTCGGC CAGCGAGCAC AGCCTGCACT TTGAAAAAAG TCCCTCCCGA 1560 

TTCACCCTGG TCAACACTCA CGCCGGAGCT TCTGTGCGAG TGGCCCTACA CCACCAGGGA 1620 

GCTTCCGGCA GCATCCGCTG TTCCTGTTCC CACGCCGAGT GCCTCCCCGT CCTCCTCAAG 1660 

ACCCTCTGTG CCTTTAACTT TTTAGATTAG CTGAAAGCAA ATATAAAATG GTGTGCTTAC 1740 

CGTAATTCTG TTTTGACTTG TGT6CTTGAT TTCTCCCCCT GCGCC6TAAT CCAGTGCCCC 1800 

3 0 TCTTCAAAAC TCTCGTACCC TATGCGATTC GCATAGGCAT ATTTTCTAAA AGCTCTGAAG 1860 

TCAACATCAC TCTCAAACAC TTCTCCGTTG TAGGTTACTT TCATCTACAG ATAA AGT 1917 

Sep 
1 

CAT CCA CCG GTT AAC ATC ATG AAG AGA AGT GTG CCC CAG GAC TTT AAT 1965 
His Pro Pro Val Asn lie Met Lys Arg Ser Val Pro Gin Asp Phe Asn . . 
5 10 15 

35 CTT GTG TAT. CCG TAC AAG GCT AAG AGG CCC AAC ATC ATG CCG CCC TTT 2013 
Leu Val Tyr Pro Tyr Lys Ala Lys Arg Pro Asn He Met Pro Pro Phe 
20 25 30 

TTT GAC CGC AAT GGC TTT GTT GAA AAC CAA GAA GCC ACG CTA GCC ATG 2061 
Phe Asp Arg Asn Gly Phe Val Glu Asn Gin Glu Ala Thr Leu Ala Met 
35 40 45 

CTT GTG GAA AAG CCG CTC ACG TTC GAC AAG GAA GGT GCG CTG ACC CTG 2109 
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Leu Val Glu Lys Pro Leu Thr Phe Asp Lys Glu Gly Ala Leu Thr Leu 
50 55 60 65 

GGC GTC GGA CGC GGC ATC CGC ATT AAC CCC GCG GGG CTT CTG GAG ACA 
Gly Val Gly Arg Gly He Arg lie Asn Pro Ala Gly Leu Leu Glu Thr 
70 75 80 

AAC GAC CTC GCG TCC GCT GTC TTC CCA CCG CTG GCC TCC GAT GAG GCC 
Asn Asp Leu Ala Ser Ala Val Phe Pro Pro Leu Ala Ser Asp Glu Ala 
85 90 95 

GGC AAC GTC ACG CTC AAC ATG TCT GAC GGG CTA TAT ACT AAG GAC AAC 
Gly Asn Val Thr Leu Asn Met Ser Asp Gly Leu Tyr Thr Lys Asp Asn 
100 105 110 

AAG CTA GCT GTC AAA GTA GGT CCC GGG CTG TCC CTC GAC TCC AAT AAT 
Lys Leu Ala Val Lys Val Gly Pro Gly Leu Ser Leu Asp Ser Asn Asn 
115 120 125 

GCT CTC CAG GTC CAC ACA GGC GAC GGG CTC ACG GTA ACC GAT GAC AAG 
Ala Leu Gin Val His Thr Gly Asp Gly Leu Thr Val Thr Asp Asp Lys 
130 135 140 145 

GTG TCT CTA AAT ACC CAA GCT CCC CTC TCG ACC ACC AGC GCG GGC CTC 
Val Ser Leu Asn Thr Gin Ala Pro Leu Ser Thr Thr Ser Ala Gly Leu 
150 155 160 



TCC CTA CTT CTG GGT CCC AGC CTC CAC TTA GGT GAG GAG GAA CGA CTA 
15 Ser Leu Leu Leu Gly Pro Ser Leu His Leu Gly Glu Glu Glu Arg Leu 
165 170 175 



20 



ACA GTA AAC ACC GGA GCG GGC CTC CAA ATT AGC AAT AAC GCT CTG GCC 
Thr Val Asn Thr Gly Ala Gly Leu Gin lie Ser Asn Asn Ala Leu Ala 
180 185 190 

GTA AAA GTA GGT TCA GGT ATC ACC GTA GAT GCT CAA AAC CAG CTC GCT 
Val Lys Val Gly Ser Gly lie Thr Val Asp Ala Gin Asn Gin Leu Ala 
195 200 205 

CCA TCC CTG GGG GAC GGT CTA GAA AGC AGA GAT AAT AAA ACT GTC GTT 
Ala Ser Leu Gly Asp Gly Leu Glu Ser Arg Asp Asn Lys Thr Val Val 
210 215 220 225 



AAG GCT GGG CCC GGA CTT ACA ATA ACT AAT CAA GCT CTT ACT GTT GCT 
Lys Ala Gly Pro Gly Leu Thr lie Thr Asn Gin Ala Leu Thr Val Ala 
230 235 240 

ACC GGG AAC GGC CTT CAG GTC AAC CCG GAA GGG CAA CTG CAG CTA AAC 
25 Thr Gly Asn Gly Leu Gin Val Asn Pro Glu Gly Gin Leu Gin Leu Asn 
245 250 255 

ATT ACT GCC GGT CAG GGC CTC AAC TTT GCA AAC AAC AGC CTC GCC GTG 
lie Thr Ala Gly Gin Gly Leu Asn Phe Ala Asn Asn Ser Leu Ala Val 
260 265 270 



30 



GAG CTG GGC TCG GGC CTG CAT TTT CCC CCT GGC CAA AAC CAA GTA AGC 
Glu Leu Gly Ser Gly Leu His Phe Pro Pro Gly Gin Asn Gin Val Ser 
275 280 285 

CTT TAT CCC GGA GAT GGA ATA GAC ATC CGA GAT AAT AGG GTG ACT GTG 
Leu Tyr Pro Gly Asp Gly lie Asp lie Arg Asp Asn Arg Val Thr Val 
290 295 300 305 



CCC GCT GGG CCA GGC CTG AGA ATG CTC AAC CAC CAA CTT GCC GTA GCT 
Pro Ala Gly Pro Gly Leu Arg Met Leu Asn His Gin Leu Ala Val Ala 
310 315 320 



TCC GGA GAC GGT TTA GAA GTC CAC AGC GAC ACC CTC CGG TTA AAG CTC 
35 Ser Gly Asp Gly Leu Glu Val His Set Asp Thr Leu Arg Leu Lys Leu 
325 330 335 

TCC CAC GGC CTG ACA TTT GAA AAT GGC GCC GTA CGA GCA AAA CTA GGA 
Ser His Gly Leu Thr Phe Glu Asn Gly Ala Val Arg Ala Lys Leu Gly 
340 345 350 

CCA GGA CTT GGC ACA GAC GAC TCT GGT CGG TCC GTG GTT CGC ACA GGT 
Pro Gly Leu Gly Thr Asp Asp Ser Gly Arg Ser Val Val Arg Thr Gly 
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2157 

2205 

2253 

2301 

2349 

2397 

2445 

2493 

2541 
2589 
2637 
2685 
2733 
2781 
2829 
2877 
2925 
2973 
3021 
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355 360 365 

CGA GGA CTT AGA GTT GCA AAC GGC CAA GTC CAG ATC TTC AGC GGA AGA 3069 
Arg Gly Leu Arg Val Ala Asn Gly Gin Val Gin He Phe Ser Gly Arg 
370 375 380 385 

GGC ACC GCC ATC GGC ACT GAT AGC AGC CTC ACT CTC AAC ATC CGG GCG 3117 
Gly Thr Ala lie Gly Thr Asp Ser Ser Leu Thr Leu Asn lie Arg Ala 
5 390 395 400 

CCC CTA CAA TTT TCT GGA CCC GCC TTG ACT GCT ACT TTG CAA GGC AGT 3165 
Pro Leu Gin Phe Ser Gly Pro Ala Leu Thr Ala Ser Leu Gin Gly Ser 
405 410 415 

GGT CCG ATT ACT TAC AAC AGC AAC AAT GGC ACT TTC GGT CTC TCT ATA 3213 
Gly Pro lie Thr Tyr Asn Ser Asn Asn Gly Thr Phe Gly Leu Ser lie 
420 425 430 

10 GGC CCC GGA ATG TGG GTA GAC CAA AAC AGA CTT CAG GTA AAC CCA GGC 3261 
Gly Pro Gly Met Trp Val Asp Gin Asn Arg Leu Gin Val Asn Pro Gly 
435 440 445 

GCT GGT TTA GTC TTC CAA GGA AAC AAC CTT GTC CCA AAC CTT GCG GAT 3309 
Ala Gly Leu Val Phe Gin Gly Asn Asn Leu Val Pro Asn Leu Ala Asp 
450 455 460 465 

CCG CTG GCT ATT TCC GAC AGC AAA ATT AGT CTC AGT CTC GGT CCC GGC 3357 
Pro Leu Ala He Ser Asp Ser Lys He Ser Leu Ser Leu Gly Pro Gly 
15 470 475 480 

CTG ACC CAA GCT TCC AAC GCC CTG ACT TTA AGT TTA GGA AAC GGG CTT 3405 
Leu Thr Gin Ala Ser Asn Ala Leu Thr Leu Ser Leu Gly Asn Gly Leu 
485 490 495 

GAA TTC TCC AAT CAA GCC GTT GCT ATA AAA GCG GGC CGG GGC TTA CGC 3453 
Glu Phe Ser Asn Gin Ala Val Ala lie Lys Ala Gly Arg Gly Leu Arg 
500 505 510 

20 TTT GAG TCT TCC TCA CAA GCT TTA GAG AGC AGC CTC ACA GTC GGA AAT 3501 
Phe Glu Ser Ser Ser Gin Ala Leu Glu Ser Ser Leu Thr Val Gly Asn 
515 520 525 

GGC TTA ACG CTT ACC GAT ACT GTG ATC CGC CCC AAC CTA GGG GAC GGC 3549 
Gly Leu Thr Leu Thr Asp Thr Val He Arg Pro Asn Leu Gly Asp Gly 
530 535 540 545 

CTA GAG GTC AGA GAC AAT AAA ATC ATT GTT AAG CTG GGC GCG AAT CTT 3597 
Leu Glu Val Arg Asp Asn Lys He He Val Lys Leu Gly Ala Asn Leu 
2 5 550 555 560 

CGT TTT GAA AAC GGA GCC GTA ACC GCC GGC ACC GTT AAC CCT TCT GCG 3645 
Arg Phe Glu Asn Gly Ala Val Thr Ala Gly Thr Va Asn Pro Ser Ala 
565 570 575 

CCC GAG GCA CCA CCA ACT CTC ACT GCA GAA CCA CCC CTC CGA GCC TCC 3693 
Pro Glu Ala Pro Pro Thr Leu Thr Ala Glu Pro Pro Leu Arg Ala Ser 
580 585 590 

30 AAC TCC CAT CTT CAA CTG TCC CTA TCG GAG GGC TTG GTT GTG CAT AAC 3741 
Asn Ser His Leu Gin Leu Ser Leu Ser Glu Gly Leu Val Val His Asn 
595 600 605 

AAC GCC CTT GCT CTC CAA CTG GGA GAC GGC ATG GAA GTA AAT CAG CAC 3789 
Asn Ala Leu Ala Leu Gin Leu Gly Asp Gly Het Glu Val Asn Gin His 
610 615 620 625 

GGA CTT ACT TTA AGA GTA GGC TCG GGT TTG CAA ATG CGT GAC GGC ATT 3837 
Gly Leu Thr Leu Arg Val Gly Ser Gly Leu Gin Het Arg Asp Gly He 
35 630 635 640 

TTA ACA GTT ACA CCC AGC GGC ACT CCT ATT GAG CCC AGA CTG ACT GCC 3885 
Leu Thr Val Thr Pro Ser Gly Thr Pro He Glu Pro Arg Leu Thr Ala 
645 650 655 

CCA CTG ACT CAG ACA GAG AAT GGA ATC GGG CTC GCT CTC GGC GCC GGC 3933 
Pro Leu Thr Gin Thr Glu Asn Gly He Gly Leu Ala Leu Gly Ala Gly 
660 665 670 
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TTG GAA TTA GAC GAG AGC GCG CTC CM GTA AAA GTT GGG CCC GGC ATG 
Leu Glu Leu Asp Glu Ser Ale Leu Gin Val Lys Val Gly Pro Gly Net 
675 680 685 

CGC CTG AAC CCT GTA GAA AAG TAT GTA ACC CTG CTC CTG GGT CCT GGC 
Arg Leu Asn Pro Val Glu Lys Tyr Val Thr Leu Leu Leu Gly Pro Gly 
690 695 700 705 

5 CTT AGT TTT GGG CAG CCG GCC AAC AGG ACA AAT TAT GAT GTG CGC GTT 
Leu Ser Phe Gly Gin Pro Ala Asn Arg Thr Asn Tyr Asp Val Arg Val 
710 715 720 

TCT GTG GAG CCC CCC ATG GTT TTC GGA CAG CGT GGT CAG CTC ACA TTT 
Ser Val Glu Pro Pro Met Val Phe Gly Gin Arg Gly Gin Leu Thr Phe 
725 730 735 

TTA GTG GGT CAC GGA CTA CAC ATT CAA AAT TCC AAA CTT CAG CTC AAT 
Leu Val Gly His Gly Leu His He Gin Asn Ser Lys Leu Gin Leu Asn 
10 740 745 750 

TTG GGA CAA GGC CTC AGA ACT GAC CCC GTC ACC AAC CAG CTG GAA GTG 
Leu Gly Gin Gly Leu Arg Thr Asp Pro Val Thr Asn Gin Leu Glu Val 
755 760 765 

CCC CTC GGT CAA GGT TTG GAA ATT GCA GAC GAA TCC CAG GTT AGG GTT 
Pro Leu Gly Gin Gly Leu Glu He Ala Asp Glu Ser Gin Val Arg Val 
770 775 780 785 

15 AAA TTG GGC GAT GGC CTG CAG TTT GAT TCA CAA GCT CGC ATC ACT ACC 
Lys Leu Gly Asp Gly Leu Gin Phe Asp Ser Gin Ala Arg He Thr Thr 
790 795 800 

GCT CCT AAC ATG GTC ACT GAA ACT CTG TGG ACC GGA ACA GGC AGT AAT 
Ala Pro Asn Het Val Thr Glu Thr Leu Trp Thr Gly Thr Gly Ser Asn 
805 810 815 

GCT AAT GTT ACA TGG CGG GGC TAC ACT GCC CCC GGC AGC AAA CTC TTT 
Ala Asn Val Thr Trp Arg Gly Tyr Thr Ala Pro Gly Ser Lys Leu Phe 
20 820 825 830 

TTG AGT CTC ACT CGG TTC AGC ACT GGT CTA GTT TTA GGA AAC ATG ACT 
Leu Ser Leu Thr Arg Phe Ser Thr Gly Leu Val Leu Gly Asn Met Thr 
635 840 845 

ATT GAC AGC AAT GCA TCC TTT GGG CAA TAC ATT AAC GCG GGA CAC GAA 
He Asp Ser Asn Ala Ser Phe Gly Gin Tyr He Asn Ala Gly His Glu 
850 855 860 865 

25 CAG ATC GAA TGC TTT ATA TTG TTG GAC AAT CAG GGT AAC CTA AAA GAA 
Gin He Glu Cys Phe He Leu Leu Asp Asn Gin Gly Asn Leu Lys Glu 
870 875 880 

GGA TCT AAC TTG CAA GGC ACT TGG GAA GTG AAG AAC AAC CCC TCT GCT 
Gly Ser Asn Leu Gin Gly Thr Trp Glu Val Lys Asn Asn Pro Ser Ala 
885 890 895 

TCC AAA GCT GCT TTT TTG CCT TCC ACC GCC CTA TAC CCC ATC CTC AAC 
Ser Lys Ala Ala Phe Leu Pro Ser Thr Ala Leu Tyr Pro He Leu Asn 
30 °00 905 910 

GAA AGC CGA GGG AGT CTT CCT GGA AAA AAT CTT GTG GGC ATG CAA GCC 
Glu Ser Arg Gly Ser Leu Pro Gly Lys Asn Leu Val Gly Met Gin Ala 
915 920 925 

ATA CTG GGA GGC GGG GGC ACT TGC ACT GTG ATA GCC ACC CTC AAT GGC 
He Leu Gly Gly Gly Gly Thr Cys Thr Val He Ala Thr Leu Asn Gly 
930 935 940 945 

35 AGA CGC AGC AAC AAC TAT CCC GCG GGC CAG TCC ATA ATT TTC GTG TGG 
Arg Arg Ser Asn Asn Tyr Pro Ala Gly Gin Ser He He Phe Val Trp 
950 955 960 

CAA GAA TTC AAC ACC ATA GCC CGC CAA CCT CTG AAC CAC TCT ACA CTT 
Gin Glu Phe Asn Thr He Ala Arg Gin Pro Leu Asn His Ser Thr Leu 
965 970 975 

ACT TTT TCT TAC TGG ACT TAAATAAGTT GGAAATAAAG AGTTAAACTG 
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Thr Phe Ser Tyr Trp Thr 
980 

AATGTTTAAG TGCAACAGAC TTTTATTGGT TTTGGCTCAC AACAAATTAC AACAGCATAG 4953 

ACAAGTCATA CCGGTCAAAC AACACAGGCT CTCGAAAACG GGCTAACCGC TCCAAGAATC 5013 

TGTCACGCAG ACGAGCAAGT CCTAAATGTT TTTTCACTCT CTTCGGGGCC AAGTTCAGCA 5073 

TGTATCGCAT TTTCT6CTTA CACCTTT 5100 

(2) INFORMATION FOR SEQ ID NO:26: 

(I) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 983 amino acids 

(B) TYPE: amino acid 
(D> TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:26: 



Ser His Pro Pro Val Asn lie Met Lys Arg Ser Val Pro Gin Asp Phe 
15 10 15 

Asn Leu Val Tyr Pro Tyr Lys Ala Lys Arg Pro Asn lie Met Pro Pro 
20 25 30 

Phe Phe Asp Arg Asn Gly Phe Val Glu Asn Gin Glu Ala Thr Leu Ala 
35 40 45 

Met Leu Val Glu Lys Pro Leu Thr Phe Asp Lys Glu Gly Ala Leu Thr 
50 55 60 

Leu Gly Val Gly Arg Gly lie Arg lie Asn Pro Ala Gly Leu Leu Glu 
65 70 75 80 

20 Thr Asn Asp Leu Ala Ser Ala Val Phe Pro Pro Leu Ala Ser Asp Glu 
85 90 05 

Ala Gly Asn Val Thr Leu Asn Met Ser Asp Gly Leu Tyr Thr Lys Asp 
100 105 110 

Asn Lys Leu Ala Val Lys Val Gly Pro Gly Leu Ser Leu Asp Ser Asn 
115 120 125 

Asn Ala Leu Gin Val His Thr Gly Asp Gly Leu Thr Val Thr Asp Asp 
25 «0 135 140 

Lys Val Ser Leu Asn Thr Gin Ala Pro Leu Ser Thr Thr Ser Ala Gly 
145 150 155 160 

Leu Ser Leu Leu Leu Gly Pro Ser Leu His Leu Gly Glu Glu Glu Arg 
165 170 175 



Leu Thr Val Asn Thr Gly Ala Gly Leu Gin lie Ser Asn Asn Ala Leu 
180 185 190 

Ala Val Lys Val Gly Ser Gly lie Thr Val Asp Ala Gin Asn Gin Leu 
195 200 205 

Ala Ala Ser Leu Gly Asp Gly Leu Glu Ser Arg Asp Asn Lys Thr Val 
210 215 220 

Val Lys Ala Gly Pro Gly Leu Thr lie Thr Asn Gin Ala Leu Thr Val 
225 230 235 240 

35 Ala Thr Gly Asn Gly Leu Gin Val Asn Pro Glu Gly Gin Leu Gin Leu 
245 250 255 

Asn He Thr Ala Gly Gin Gly Leu Asn Phe Ala Asn Asn Ser Leu Ala 
260 265 270 

Val Glu Leu Gly Ser Gly Leu His Phe Pro Pro Gly Gin Asn Gin Val 
275 280 285 
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Ser Leu Tyr Pro Gly Asp Gly He Asp He Arg Asp Asn Arg Val Thr 
290 295 300 

Val Pro Ala Gly Pro Gly Leu Arg Met Leu Asn His Gin Leu Ala Val 
305 310 315 320 

Ala Ser Gly Asp Gly Leu Glu Val His Ser Asp Thr Leu Arg Leu Lys 
325 330 335 

5 

Leu Ser His Gly Leu Thr Phe Glu Asn Gly Ala Val Arg Ala Lys Leu 
340 345 350 

Gly Pro Gly Leu Gly Thr Asp Asp Ser Gly Arg Ser Val Val Arg Thr 
355 360 365 

Gly Arg Gly Leu Arg Val Ala Asn Gly Gin Val Gin lie Phe Ser Gly 
370 375 380 

10 Arg Gly Thr Ala lie Gly Thr Asp Ser Ser Leu Thr Leu Asn lie Arg 
385 390 395 400 

Ala Pro Leu Gin Phe Ser Gly Pro Ala Leu Thr Ala Ser Leu Gin Gly 
405 410 415 

Ser Gly Pro lie Thr Tyr Asn Ser Asn Asn Gly Thr Phe Gly Leu Ser 
420 425 430 

He Gly Pro Gly Met Trp Val Asp Gin Asn Arg Leu Gin Val Asn Pro 
15 435 440 445 

Gly Ala Gly Leu Val Phe Gin Gly Asn Asn Leu Val Pro Asn Leu Ala 
450 455 460 

Asp Pro Leu Ala He Ser Asp Ser Lys lie Ser Leu Ser Leu Gly Pro 
4*5 470 475 480 

Gly Leu Thr Gin Ala Ser Asn Ala Leu Thr Leu Ser Leu Gly Asn Gly 
485 490 495 

Leu Glu Phe Ser Asn Gin Ala Val Ala He Lys Ala Gly Arg Gly Leu 
500 505 510 

Arg Phe Glu Ser Ser Ser Gin Ala Leu Glu Ser Ser Leu Thr Val Gly 
515 520 525 

Asn Gly Leu Thr Leu Thr Asp Thr Val He Arg Pro Asn Leu Gly Asd 
530 535 540 

25 Gly Leu Glu Val Arg Asp Asn Lys He He Val Lys Leu Gly Ala Asn 
545 550 555 5 60 

Leu Arg Phe Glu Asn Gly Ala Val Thr Ala Gly Thr Val Asn Pro Ser 
565 570 575 

Ala Pro Glu Ala Pro Pro Thr Leu Thr Ala Glu Pro Pro Leu Arg Ala 
580 585 590 

Ser Asn Ser His Leu Gin Leu Ser Leu Ser Glu Gly Leu Val Val His 
3 0 595 600 605 

Asn Asn Ala Leu Ala Leu Gin Leu Gly Asp Gly Met Glu Val Asn Gin 
610 615 620 

His Gly Leu Thr Leu Arg Val Gly Ser Gly Leu Gin Met Arg Asp Gly 
625 630 635 640 
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He Leu Thr Val Thr Pro Ser Gly Thr Pro He Glu Pro Arg Leu Thr 
645 650 655 

Ala Pro Leu Thr Gin Thr Glu Asn Gly He Gly Leu Ala Leu Gly Ala 
660 665 670 

Gly Leu Glu Leu Asp Glu Ser Ala Leu Gin Val Lys Val Gly Pro Gly 
675 680 685 

Met Arg Leu Asn Pro Val Glu Lys Tyr Val Thr Leu Leu Leu Gly Pro 
690 695 700 
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Gly Leu Ser Phe 6ly Gin Pro Ale Asn Arg Thr Asn Tyr Asp Val Arg 
705 710 715 720 

Val Ser Val Glu Pro Pro Met Val Phe Gly Gin Arg Gly Gin Leu Thr 
725 730 735 

Phe Leu Val Gly His Gly Leu His lie Gin Asn Ser Lys Leu Gin Leu 
740 745 750 

5 

Asn Leu Gly Gin Gly Leu Arg Thr Asp Pro Val Thr Asn Gin Leu Glu 
755 760 765 

Val Pro Leu Gly Gin Gly Leu Glu He Ala Asp Glu Ser Gin Val Arg 
770 775 780 

Val Lys Leu Gly Asp Gly Leu Gin Phe Asp Ser Gin Ala Arg He Thr 
785 790 795 800 

10 Thr Ala Pro Asn Met Val Thr Glu Thr Leu Trp Thr Gly Thr Gly Ser 
805 810 815 

Asn Ala Asn Val Thr Trp Arg Gly Tyr Thr Ala Pro Gly Ser Lys Leu 
820 825 830 

Phe Leu Ser Leu Thr Arg Phe Ser Thr Gly Leu Val Leu Gly Asn Met 
835 840 845 

Thr lie Asp Ser Asn Ala Ser Phe Gly Gin Tyr He Asn Ala Gly His 
15 850 855 860 

Glu Gin He Glu Cys Phe He Leu Leu Asp Asn Gin Gly Asn Leu Lys 
865 870 875 880 

Glu Gly Ser Asn Leu Gin Gly Thr Trp Glu Val Lys Asn Asn Pro Ser 
885 890 895 



20 



Ala Ser Lys Ala Ala Phe Leu Pro Ser Thr Ala Leu Tyr Pro He Leu 
900 905 910 

Asn Glu Ser Arg Gly Ser Leu Pro Gly Lys Asn Leu Val Gly Met Gin 
915 920 925 

Ala He Leu Gly Gly Gly Gly Thr Cys Thr Val He Ala Thr Leu Asn 
930 935 940 

Gly Arg Arg Ser Asn Asn Tyr Pro Ala Gly Gin Ser He He Phe Val 
945 950 955 960 

25 Trp Gin Glu Phe Asn Thr He Ala Arg Gin Pro Leu Asn His Ser Thr 
965 970 975 

Leu Thr Phe Ser Tyr Trp Thr 
980 

(2) INFORMATION FOR SEQ ID N0:27: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 227 amino acids 
30 (B) TYPE: amino acid 

(C) STRANDED NESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:27: 

35 Met Ser Lys Glu He Pro Thr Pro Tyr Met Trp Ser Tyr Gin Pro Gin 

1 5 10 15 

Met Gly Leu Ala Ala Gly Ala Ala Gin Asp Tyr Ser Thr Arg He Asn 
20 25 30 

Tyr Met Ser Ala Gly Pro His Met He Ser Arg Val Asn Gly He Arg 
35 40 45 
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Ala His Arg Asn Arg lie Leu Leu Glu Gin Ale Ale lie Thr Thr Thr 
50 55 60 

Pro Arg Asn Asn Leu Asn Pro Arg Ser Trp Pro Ala Ala Leu Val Tyr 
65 70 75 80 

Gin Glu Ser Pro Ala Pro Thr Thr Val Val Leu Pro Arg Asp Ala Gin 
85 90 95 

5 

Ala Glu Val Gin Met Thr Asn Ser Gly Ala Gin Leu Ala Gly Gly Phe 
100 105 110 

Arg His Arg Val Arg Ser Pro Gly Gin Gly lie Thr His Leu Lys He 
115 120 125 

Arg Gly Arg Gly lie Gin Leu Asn Asp Glu Ser Val Ser Ser Ser Leu 
130 135 140 

10 Gly Leu Arg Pro Asp Gly Thr Phe Gin lie Gly Gly Ala Gly Arg Ser 

145 150 155 160 

Ser Phe Thr Pro Arg Gin Ala He Leu Thr Leu Gin Thr Ser Ser Ser 
165 170 175 

Glu Pro Arg Ser Gly Gly lie Gly Thr Leu Gin Phe lie Glu Glu Phe 
180 185 190 

Val Pro Ser Val Tyr Phe Asn Pro Phe Ser Gly Pro Pro Gly His Tyr 
15 195 200 205 

Pro Asp Gin Phe lie Pro Asn Phe Asp Ala Val Lys Asp Ser Ala Asp 
210 215 220 

Gly Tyr Asp 
225 

(2) INFORMATION FOR SEQ ID NO:28: 

20 (O SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 128 amino acids 

(B) TYPE: amino acid 

(C) STRAND EDNESS: single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: protein 



25 UO SEQUENCE DESCRIPTION: SEQ ID N0:28: 

Met Thr Asp Thr Leu Asp Leu Glu Net Asp Gly He He Thr Glu Gin 
15 10 15 

Arg Leu Leu Glu Arg Arg Arg Ala Ala Ala Glu Gin Gin Arg Met Asn 
20 25 30 

Gin Glu Leu Gin Asp Met Val Asn Leu His Gin Cys Lys Arg Gly He 
35 40 45 

Phe Cys Leu Val Lys Gin Ala Lys Val Thr Tyr Asp Ser Asn Thr Thr 
50 55 60 

Gly His Arg Leu Ser Tyr Lys Leu Pro Thr Lys Arg Gin Lys Leu Val 
65 70 75 80 

Val Met Val Gly Glu Lys Pro He Thr He Thr Gin His Ser Val Glu 
85 90 95 

35 Thr Glu Gly Cys He His Ser Pro Cys Gin Gly Pro Glu Asp Leu Cys 

100 105 110 

Thr Leu He Lys Thr Leu Cys Gly Leu Lys Asp Leu He Pro Phe Asn 
115 120 125 



30 



(2) INFORMATION FOR SEQ ID N0:29: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 582 amino acids 

(B) TYPE: amino acid 

(C) STRAND EDNESS: single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: protein 



<xi) SEQUENCE DESCRIPTION: SEO ID NO:29: 



Met Lys Arg Ala Arg Pro Ser Glu Asp Thr Phe Asn Pro Val Tyr Pro 
15 10 15 

Tyr Asp Thr Glu Thr Gly Pro Pro Thr Val Pro Phe Leu Thr Pro Pro 
20 25 30 

10 Phe Val Ser Pro Asn Gly Phe Gin Glu Ser Pro Pro Gly Val Leu Ser 

35 40 45 

Leu Arg Val Ser Glu Pro Leu Asp Thr Ser His Gly Met Leu Ala Leu 
50 55 60 

Lys Met Gly Ser Gly Leu Thr Leu Asp Lys Ala Gly Asn Leu Thr Ser 
65 70 75 80 

Gin Asn Val Thr Thr Val Thr Gin Pro Leu Lys Lys Thr Lys Ser Asn 
15 85 90 95 

He Ser Leu Asp Thr Ser Ala Pro Leu Thr He Thr Ser Gly Ala Leu 
100 105 no 

Thr Val Ala Thr Thr Ala Pro Leu He Val Thr Ser Gly Ala Leu Ser 
115 120 125 



Val Gin Ser Gin Ala Pro Leu Thr Val Gin Asp Ser Lys Leu Ser He 
130 135 140 

Ala Thr Lys Gly Pro lie Thr Val Ser Asp Gly Lys Leu Ala Leu Gin 
1« 150 155 160 

Thr Ser Ala Pro Leu Ser Gly Ser Asp Ser Asp Thr Leu Thr Val Thr 
165 170 175 

Ala Ser Pro Pro Leu Thr Thr Ala Thr Gly Ser Leu Gly He Asn Met 
180 185 190 

25 Glu Asp Pro He Tyr Val Asn Asn Gly Lys He Gly He Lys He Ser 

195 200 205 

Gly Pro Leu Gin Val Ala Gin Asn Ser Asp Thr Leu Thr Val Val Thr 
210 215 220 

Gly Pro Gly Val Thr Val Glu Gin Asn Ser Leu Arg Thr Lys Val Ala 
225 230 235 240 

Gly Ala He Gly Tyr Asp Ser Ser Asn Asn Met Glu He Lys Thr Gly 
30 245 250 255 

Gly Gly Met Arg He Asn Asn Asn Leu Leu He Leu Asp Val Asp Tyr 
260 265 270 

Pro Phe Asp Ala Gin Thr Lys Leu Arg Leu Lys Leu Gly Gin Gly Pro 
275 280 285 



Leu Tyr He Asn Ala Ser His Asn Leu Asp He Asn Tyr Asn Arg Gly 
290 295 300 

Leu Tyr Leu Phe Asn Ala Ser Asn Asn Thr Lys Lys Leu Glu Val Ser 
305 310 315 320 

He Lys Lys Ser Ser Gly Leu Asn Phe Asp Asn Thr Ala He Ala He 
325 330 335 

Asn Ala Gly Lys Gly Leu Glu Phe Asp Thr Asn Thr Ser Glu Ser Pro 
340 345 350 
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Asp lie Asn Pro lie Lys Thr Lys lie Gly Ser Gly lie Asp Tyr Asn 
355 360 365 

Glu Asn Gly Ala Net lie Thr Lys Leu Gly Ala Gly Leu Ser Phe Asp 
370 375 380 

Asn Ser Gly Ala He Thr lie Gly Asn Lys Asn Asp Asp Lys Leu Thr 
385 390 395 400 

5 

Leu Trp Thr Thr Pro Asp Pro Ser Pro Asn Cys Arg He His Ser Asp 
405 410 415 

Asn Asp Cys Lys Phe Thr Leu Val Leu Thr Lys Cys Gly Ser Gin Val 
420 425 430 

Leu Ala Thr Val Ala Ala Leu Ala Val Ser Gly Asp Leu Ser Ser Met 
435 440 445 

10 Thr Gly Thr Val Ala Ser Val Ser He Phe Leu Arg Phe Asp Gin Asn 

450 455 460 

Gly Val Leu Met Glu Asn Ser Ser Leu Lys Lys His Tyr Trp Asn Phe 
465 470 475 480 

Arg Asn Gly Asn Ser Thr Asn Ala Asn Pro Tyr Thr Asn Ala Val Gly 
485 490 495 

Phe Met Pro Asn Leu Leu Ala Tyr Pro Lys Thr Gin Ser Gin Thr Ala 
15 500 505 510 

Lys Asn Asn He Val Ser Gin Val Tyr Leu His Gly Asp Lys Thr Lys 
515 520 525 

Pro Met He Leu Thr He Thr Leu Asn Gly Thr Ser Glu Ser Thr Glu 
530 535 540 
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Thr Ser Glu Val Ser Thr Tyr Ser Met Ser Phe Thr Trp Ser Trp Glu 
545 550 555 560 

Ser Gly Lys Tyr Thr Thr Glu Thr Phe Ala Thr Asn Ser Tyr Thr Phe 
565 570 575 

Ser Tyr He Ala Gin Glu 
580 

(2) INFORMATION FOR SEQ ID N0:30: 

(i) SEQUENCE CHARACTERISTICS: 
25 (A) LENGTH: 7 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(ix) FEATURE: 

(A) NAME/KEY: Modff ied-sfte 
30 (B) LOCATION: 2 

(0) OTHER INFORMATION: /notes "This position is X2." 

(ix) FEATURE: 

(A) NAME/KEY: Modif ied-site 

(B) LOCATION: 4 

(0) OTHER INFORMATION: /note= "This position is X13." 

(ix) FEATURE: 

(A) NAME/KEY: Modif ied-site 
35 <8) LOCATION: 6 

CD) OTHER INFORMATION: /note= "This position is X2-" 

(xi) SEQUENCE DESCRIPTION: SEO ID NO:30: 

Cys Xaa Cys Xaa Cys Xaa Cys 
1 5 
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(2) INFORMATION FOR SEQ ID M0:31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: amino acid 

(C) STRANDED NESS: single 
<D) TOPOLOGY: linear 

5 <H> MOLECULE TYPE: peptide 



(Xi) SEQUENCE DESCRIPTION: SEQ ID N0:31: 

Gin Ser Ser Xaa Ser Thr Ser 
1 5 

(2) INFORMATION FOR SEQ ID N0:32: 

10 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 
(D> TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:32: 

Pro Leu Leu Phe Ala Phe Val Leu Cys Thr Gly Cys Ala Val Leu Leu 
15 ID 15 

Thr Ala Phe Gly Pro Ser He Leu Ser Gly Thr 
20 25 

(2) INFORMATION FOR SEQ ID NO:33: 

20 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 57 amino acids 

(B) TYPE: amino acid 
<C> STRANDEDNESS: single 
(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:33: 

25 

Glu Glu Val Thr Ser His Phe Phe Leu Asp Cys Pro Glu Asp Pro Ser 
15 10 15 

Arg Glu Cys Ser Ser Cys Gly Phe His Gin Ala Gin Ser Gly lie Pro 
20 25 30 

Gly lie Met Cys Ser Leu Cys Tyr Met Arg Gin Thr Tyr His Cys He 
35 40 45 

30 Tyr Ser Pro Val Ser Glu Glu Glu Met 

50 55 

(2) INFORMATION FOR SEQ ID N0:34: 

(f) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

35 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:34: 

Val Asp Leu Glu Cys His Glu Val Leu Pro Pro Ser 
1 5 10 
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1. A live recombinant bovine adenovirus 
vector (BAV) system selected from the group consisting 
5 of: 

(a) a system wherein part or all of 
the El gene region is replaced by a heterologous 
nucleotide sequence encoding a foreign gene or 
fragment thereof; 

10 (b) a system wherein a part or all of 

the E3 gene region is replaced by a heterologous 
nucleotide sequence encoding a foreign gene or 
fragment thereof; and 

(c) a system wherein part or all of 

15 the El gene region and part or all of the E3 gene 
region are deleted and a heterologous nucleotide 
sequence encoding a foreign gene or fragment thereof 
is inserted into at least one of the deletions. 

20 2. The BAV system of claim 1 which is a 

bovine adenovirus type 3. 

3. The BAV system of claim 1 wherein (a) a 
recombinant BAV wherein part or all of the El gene 

25 region is replaced by a heterologous nucleotide 

sequence encoding a foreign gene or fragment thereof. 

4. The BAV system of claim 1 wherein (b) a 
recombinant BAV wherein a part or all of the E3 gene 

30 region is replaced by a heterologous nucleotide 

sequence encoding a foreign gene or fragment thereof. 

5. The BAV system of claim 1 wherein the 
foreign nucleotide sequence is with or without the 

35 control of an exogenous promoter. 

6. The BAV system of claim 1 wherein (c) a 
system wherein part or all of the El gene region and 
part or all of the E3 gene region are deleted and a 



WO 95/16048 



-102- 



PCT/CA94/00678 



heterologous nucleotide sequence encoding a foreign 
gene or fragment thereof is inserted into at least one 
of the deletions. 

5 7. A recombinant vector system comprising 

the entire BAV genome and a plasmid capable of 
generating a recombinant virus by in vivo 
recombination following cotransf ection of a suitable 
cell line comprising the entire BAV genome 

10 representing the wild-type BAV genome and a plasmid 

comprising an adenovirus left end nucleotide sequences 
containing the E1A gene region or a plasmid comprising 
adenovirus right end sequences containing the E3 gene 
region, the plasmid with a heterologous nucleotide 

15 sequence encoding a foreign gene or fragment thereof 
substituted for part or all of the El and /or E3 gene 
regions , respectively . 

8 . A recombinant bovine adenovirus vector 
20 system comprising two plasmids capable of generating a 
recombinant virus by in vivo recombination following 
cotransf ection of a cell line comprising 

(1) a first plasmid comprising the 
entire BAV genome except for a deletion of part or all 

25 of the El and/ or E3 gene regions, and 

(2) a second plasmid comprising BAV 
left or right end nucleotide sequences containing the 
El or E3 gene regions, respectively, having a 
heterologous nucleotide sequence encoding a foreign 

30 gene or fragment thereof inserted for the deletion of 
a part or all of the El or E3 gene regions. 

.9. A live viable recombinant bovine 
adenovirus (BAV) comprising a deletion of part or all 
35 of the El gene region, a deletion of part or all of 
the E3 gene region or deletion of both, and inserted 
into at least one deletion a heterologous nucleotide 
sequence coding for a polypeptide or an antigenic 
determinant produced by a disease causing organism. 



WO 95/16048 PCT/CA94/00678 

-103- 

10. A live viable recombinant bovine 
adenovirus (BAV) for producing an immune response in a 
mammalian host comprising: 

(1) a live bovine adenovirus (BAV) 

5 modified to contain a heterologous nucleotide sequence 
coding for a polypeptide or an antigenic determinant 
corresponding to the desired immune response in 
association with or without 

(2) an effective promoter for said 
10 nucleotide sequence to provide expression of said 

antigenic determinant in immunogenic non-pathogenic 
quantities . 



11. A live recombinant bovine adenovirus 
15 expression system comprising a deletion of all or part 
of the El gene region or all or part of the E3 gene 
region, or both deletions and inserted in at least one 
deletion a heterologous nucleotide sequence coding for 
a foreign gene or fragment thereof under control of an 
20 expression promoter with or without one or more 
polyadenylation signal. 



12. A recombinant mammalian cell line 
comprising bovine adenovirus (BAV) El gene region, 
said recombinant cell line thereby capable of allowing 
replication therein of a bovine adenovirus comprising 
an El deletion which may or may not be replaced by a 
heterologous or homologous nucleotide sequence 
encoding a foreign gene or fragment thereof. 

13. The cell line of claim 12 which is a 
bovine cell line. 

14. The recombinant mammalian cell line of 
claim 12 wherein the heterologous or homologous 
nucleotide sequence encoding the foreign gene or 
fragment thereof is selected from the group consisting 
of a bovine adenovirus (BAV) El polypeptide,, a BAV 
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El-associated polypeptide, a growth factor, a cellular 
receptor or other cellular polypeptide. 

15. A recombinant mammalian cell line 
5 comprising bovine adenovirus El genes, said 

recombinant cell line thereby capable of allowing DNA- 
mediated transfection to generate a recombinant bovine 
adenovirus (BAV) selected from the group consisting 
of: 

10 ( a ) a recombinant BAV wherein part or all of 

the El gene region is replaced by a heterologous 
nucleotide sequence encoding a foreign gene or 
fragment thereof, 

(b) a recombinant BAV wherein part or all of 
15 the E3 gene region is replaced by a heterologous 

nucleotide sequence encoding a foreign gene or 
fragment thereof, 

(c) a recombinant BAV wherein part or all of 
the El gene region and part or all of the E3 gene 
region are deleted and inserted into at least one 
deletion a heterologous nucleotide sequence encoding a 
foreign gene or fragment thereof, 

(d) a recombinant BAV wherein part or all of 
the El gene region and/or part or all of the E3 gene 

25 region are deleted and inserted into at least one 

deletion a heterologous nucleotide sequence encoding 
more than one foreign gene or fragment thereof to 
produce a recombinant fusion protein, and 

(e) a mutant BAV wherein part or all of the 
El gene region and/or part or all of the E3 gene 
region are deleted. 
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16. A method of preparing a recombinant 
polypeptide or fragment thereof which comprises: 
35 U) infecting the mammalian cell line 

of claim 12, with a recombinant bovine adenovirus 
comprising a deletion of part or all of the El gene 
region and/or part or all of the E3 gene region and 
inserted into at least one deletion a heterologous 
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nucleotide sequence encoding the polypeptide or 
fragment thereof, 

(2) replicating the recombinant virus 
in a recombinant cell line under conditions to provide 

5 for expression of the polypeptide, and 

(3) recovering the recombinant 
polypeptide or antigenic fragment thereof, 

17. A method of isolating a polypeptide 
10 which comprises: 

(1) replicating a recombinant 
mammalian cell line of claim 12 under conditions to 
provide for expression of the polypeptide, and 

(2) recovering the polypeptide or 
15 fragment thereof. 

18. A method for eliciting an immune 
response in a mammalian host to protect against an 

20 infection comprising: 

administering a vaccine composition 
comprising a live recombinant BAV of claim l wherein 
the foreign gene or fragment encodes an antigen with 
or without a pharmaceutically acceptable carrier. 

25 

19. A method for eliciting an immune 
response in a mammalian host to protect against an 
infection comprising: 

administering a vaccine comprising a 
30 recombinant polypeptide or fragment thereof prepared 
by a method of claim 16 with or without a 
pharmaceutically acceptable carrier. 

20. A vaccine for protecting a mammalian 
35 host against infection comprising a live recombinant 

adenovirus of claim 1 wherein the foreign gene or 
fragment encodes an antigen with or without a 
pharmaceutically acceptable carrier. 
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21. A vaccine for protecting a mammalian 
host against infection comprising a recombinant 
antigen prepared by a method of claim 16 with or 
without a pharmaceutically acceptable carrier. 

5 

22. A mutant bovine adenovirus (BAV) 
comprising a deletion of part or all of El and/or a 
deletion of part or all of E3. 

10 23. A method for providing gene therapy to 

a mammal in need thereof to control a gene deficiency 
which comprises administering to said mammal a live 
recombinant bovine adenovirus containing a foreign 
nucleotide sequence encoding a non-defective form of 

15 said gene under conditions wherein the recombinant 
virus vector genome is incorporated into said 
mammalian genome or is maintained independently and 
extrachromosomally to provide expression of the 
required gene in the target organ or tissue. 



35 
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Ad5 120 132 

IleAspLeuThrCysHisGluAlaGlyPheProProSer 

: | | 111 III 

ValAspLeuGluCysHisGluVal LeuProProSer 

BAV3 26 " 37 



FIG. 2B 



Ad5 82 100 

LeuAspPheSerThrProGlyArgAlaAlaAlaAlaValAlaPheLeuSerPhelle 

ii I i I I II I I II 
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FIG. 3A 



Ad5 20 26 

GlnSerSerAsnSerThrSer 

III III 

GlnSerSerArgSerThrSer 
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